Research Session Log

This document contains the log of project-1. Project-1 is a new attempt to basically from scratch figure out how to create a general world modeling algorithm. The log documents day to day research activities and results.

The Overall Story gives a brief history of the project.

If you are reading this document to get up to date with the project read it backwards (though still look at The Overall Story, potentially also backwards), and read refferences as neccesary. Most old content is superseded.

1. The Overall Story
2. Typst: The Science Algorithm
3. 2024-08-12 Thomas
4. 2024-08-13 - IA - Object World Network
5. 2024-08-13 Gurkenglas
6. 2024-08-14
7. 2024-08-14 - Object Interactions
8. 2024-08-15 - Julia
9. 2024-08-16 - Juila
10. 2024-08-16 - Yannick
11. 2024-08-18 - Julia
12. 2024-08-19 - Maren - Zendo
13. 2024-08-19 - Thomas - Reflection
14. 2024-08-20 - Julia
15. 2024-08-20 - Gurkenglas
16. 2024-08-21 - Programming Language Features
17. 2024-08-21 - Mateusz Bagiński
18. 2024-08-21 - IA - FSA inference
19. 2024-08-23 - Thomas - FSA inference
20. 2024-08-24 - Thomas -
21. 2024-08-25 - IA
22. 2024-08-26 - Maren
23. 2024-08-26 - Thomas
24. Julia 2024-08-27
25. 2024-08-30
26. Mock
27. 2024-09-01 - Thomas
28. 2024-09-05 - Matthias G. Mayer - Propositional Logic Function
29. A Simple World-Model Model
30. Program Search is the Core
31. Continuous NAND Circuits ATTACH
32. Taking a step back and understanding Program Search
33. Back to the basics of World Models
34. A World to learn a Factored Update Function in
35. Preventing self-modication is Hard
36. Orca Intelligence
37. "Goal Systems" can be complex
38. Examples of aligned systems
39. Sharing Goals
40. Ways to frame goals
41. Goal can Conflict
42. Converging Turning Machines
43. Controlled Generation: pruning without generating and then filtering possibilities
44. Learning Morphology Through Play: algorithms for building models of your body that allow you to efficently traverse the worlds state space
45. Making the comptuer do what I want: Reframing Alignment from a Programing Perspective
46. Letting Yourself be Shutdown is Unnatural
47. RLHF might always Result in Wierd Minds astanged from humans
48. A Research House
49. Conva: Exfohazards with Julia
50. How my research might fail
51. Constructing a simple Language
52. Types of Cached Thoughts in a Mind
53. Designing a General-Knowledge Generator [2025-02-18 Di]
54. Organizing Material through Destilation by Garbage Collection [2025-03-02 So]
55. Reidentifying the Core Problem [2025-03-02 So] reorientation
56. How to make a model of the world [2025-03-02 So] with_negar
57. World Modeling Subproblems
58. Detecting Structure in a Simple Cellular Automata with_negar
59. [2025-04-15 Di]
60. Cellular Automata Investigation

1. The Overall Story
2. Typst: The Science Algorithm
- 2.1. About this Document
3. 2024-08-12 Thomas
- 3.1. Plainly coded AI
- 3.2. World Model Specification
- 3.3. World Program
- 3.4. How can we handle inter object interactions?
- 3.5. Good properties
- 3.6. Notes
- 3.7. TODO X
- 3.8. Next Steps
4. 2024-08-13 - IA - Object World Network
- 4.1. About the Name
- 4.2. Object Communication
- 4.3. What are Objects
5. 2024-08-13 Gurkenglas
- 5.1. What's the point of pointless style
  - 5.1.1. Point free style support
- 5.2. Things to do:
- 5.3. Algebraic Programming
- 5.4. Next Steps
6. 2024-08-14
- 6.1. Algebra
- 6.2. Ideas
7. 2024-08-14 - Object Interactions
- 7.1. Cartesian Meta Objects
- 7.2. Other Ideas
- 7.3. Start with Lossless Computational World Model Compression
8. 2024-08-15 - Julia
- 8.1. Scratchpad Julia
- 8.2. Johannes Scratchpad
  - 8.2.1. What Johannes works on
- 8.3. The core of efficient reasoning
- 8.4. Abstractions
- 8.5. Human Clock Ontology is unbroken by quantum Mechanics
- 8.6. Self Culture
- 8.7. Algebraic concetps
- 8.8. LLMs Don't have online learning
- 8.9. Language Forces you to become smarter
9. 2024-08-16 - Juila
- 9.1. Optimization atractor state
- 9.2. (Julia) motivating intuitions
10. 2024-08-16 - Yannick
11. 2024-08-18 - Julia
- 11.1. Combining Object Instances
- 11.2. Combining Algorithmic Abstractions - Nutonian physics and Wind
- 11.3. Examples of high actionation Spaces
- 11.4. Money is useful. How do I know.
- 11.5. Evaluating with simulations
- 11.6. Alternative Evaluation
- 11.7. How do you learn
  - 11.7.1. Multilayer search
  - 11.7.2. Distilling Concepts
- 11.8. Cannonball buttons
- 11.9. General Algorithm for Efficiency Gain
- 11.10. Concept Desiderata
12. 2024-08-19 - Maren - Zendo
- 12.1. Zendo Strategies
- 12.2. Def: Boundary Set
13. 2024-08-19 - Thomas - Reflection
- 13.1. What we did
- 13.2. Reflection
  - 13.2.1. Why did it work
14. 2024-08-20 - Julia
- 14.1. Why is Zendo important
- 14.2. Julia's Algorithm
15. 2024-08-20 - Gurkenglas
16. 2024-08-21 - Programming Language Features
17. 2024-08-21 - Mateusz Bagiński
- 17.1. What am I interested in?
  - 17.1.1. Meta
18. 2024-08-21 - IA - FSA inference
- 18.1. Reflection
19. 2024-08-23 - Thomas - FSA inference
- 19.1. The generality of Knowledge
- 19.2. Factual VS Procedural Knowledge
- 19.3. In summary
- 19.4. A formalism of Knowledge
- 19.5. Zendo Player 2
- 19.6. Why Intuitionistic Logic is good (ABANDONED)
  - 19.6.1. LEM example
  - 19.6.2. Proof by contradiction
- 19.7. Forming Hypothesis is Zendo
- 19.8. Next Steps
- 19.9. Reflection
20. 2024-08-24 - Thomas -
- 20.1. Investigate algorithms by checking compression ability
- 20.2. FSA based compression
- 20.3. FSA Backtracing
- 20.4. FSA output pair generation
- 20.5. NFA can model multiple DFA
- 20.6. Playing Zendo with NFA 1
- 20.7. Playing Zendo with NFA 2
- 20.8. FSA limitations
- 20.9. Zendo Model creation
- 20.10. Logic Thing
- 20.11. My understanding
- 20.12. Lower bund on number of FSA states based on cycle Lengths
- 20.13. Approaches so far
  - 20.13.1. Embedding an FSA
  - 20.13.2. Julias algorithm
  - 20.13.3. Predictive properties
- 20.14. My (Thomas') thoughts when trying to solve FSA Zendo
- 20.15. Manual Predictive Property Generation
- 20.16. Generating, Composing,Predicitve Properties: How to generate, compose, and use to form hypethesis
  - 20.16.1. Reducing the Predictive Property
  - 20.16.2. Compositional FSAs
  - 20.16.3. Composing the correct FSA
- 20.17. FSA and World State Graph
- 20.18. Summary
- 20.19. Next Steps
- 20.20. Reflection
- 20.21. The Core Idea
- 20.22. Misc
21. 2024-08-25 - IA
- 21.1. Predictive Property
22. 2024-08-26 - Maren
- 22.1. What makes a Game be about learning: A collection of increasingly complex learning setups
23. 2024-08-26 - Thomas
- 23.1. Iterating all \({\mathbb{N}}^{\ast}\) Vectors
- 23.2. Solomnoff induction
  - 23.2.1. Problems with solomonoff induction
- 23.3. Reflection
- 23.4. Next Steps
24. Julia 2024-08-27
- 24.1. Program search Doesn't work
- 24.2. Program search
- 24.3. Village Doctor
- 24.4. How to limit scope?
25. 2024-08-30
- 25.1. Draft Deductive World Modeling Game
- 25.2. Stub 2024-08-30 - Report
- 25.3. The Power of Propositions
26. Mock
- 26.1. World
27. 2024-09-01 - Thomas
28. 2024-09-05 - Matthias G. Mayer - Propositional Logic Function
29. A Simple World-Model Model
- 29.1. [O] Part 1
  - - 29.1.0.1. [O] Work Report
- 29.2. [O] Part 2
30. Program Search is the Core
- 30.1. The Universal Code-Interpreter Framing
- 30.2. The Programs inside a Scientist
- 30.3. Language Translation and World Models
- 30.4. Capabilities are always Dangerous
- 30.5. Mechanistic Interpetability and The Universal Code-Interpreter Framing
- 30.6. A simple world model as a diagram
31. Continuous NAND Circuits ATTACH
- 31.1. Continuous NAND gate
- 31.2. [Sidetrack] Baricentric Coordinate Optimization
- 31.3. Some Potential Directions
- 31.4. Generating NAND circuits ATTACH
- 31.5. Pinpoint search Program Search with Intermediate Values
- 31.6. Targetless Compression Search
- 31.7. Some Reflection
  - 31.7.1. The Smallest NAND gate layer that preserves Information
- 31.8. Convexity Testing
- 31.9. Aborting Continuous NAND circuits
32. Taking a step back and understanding Program Search
- 32.1. Program Information Extraction Mechanisms
- 32.2. Examples of Almost correct code
- 32.3. Alebra on Programs
- 32.4. Making a model of Program Search
  - 32.4.1. What are different forms of program search?
  - 32.4.2. The Semantics of a Program
- 32.5. [Aborting] figuring out General Program Search
33. Back to the basics of World Models
- 33.1. Factoring the Update Function of a World Model
34. A World to learn a Factored Update Function in
- 34.1. Cellular Automata Rule Learning
- 34.2. Cellular Automata Simulator
- 34.3. Work Reflection
35. Preventing self-modication is Hard
36. Orca Intelligence
37. "Goal Systems" can be complex
38. Examples of aligned systems
39. Sharing Goals
40. Ways to frame goals
41. Goal can Conflict
42. Converging Turning Machines
43. Controlled Generation: pruning without generating and then filtering possibilities
- 43.1. Scratch
- 43.2. Reflection
  - 43.2.1. TODO Generating examples seems extremenly important.
  - 43.2.2. MAYBE How to get feedback from your ideas
44. Learning Morphology Through Play: algorithms for building models of your body that allow you to efficently traverse the worlds state space
- 44.1. Space Control Game
  - 44.1.1. MVP
  - 44.1.2. Mods
- 44.2. Lunar Lander Game
45. Making the comptuer do what I want: Reframing Alignment from a Programing Perspective
46. Letting Yourself be Shutdown is Unnatural
47. RLHF might always Result in Wierd Minds astanged from humans
48. A Research House
49. Conva: Exfohazards with Julia
- 49.1. Bucketing
- 49.2. Explaining my research
50. How my research might fail
51. Constructing a simple Language
- 51.1. Motivation
- 51.2. Information VS Presentation
- 51.3. Lisp and Language
- 51.4. Types of Quotes in Natural Language
- 51.5. Inderect refferences are unneccesary
- 51.6. Basic Language Constructs
- 51.7. Notes taken during conversation
52. Types of Cached Thoughts in a Mind
53. Designing a General-Knowledge Generator [2025-02-18 Di]
- 53.1. Defining the Most Compressing Function
- 53.2. Open Questions:
- 53.3. Program Search by Structure Finding
- 53.4. Scratchpads
  - 53.4.1. 2025-02-18 Creativity with_negar
54. Organizing Material through Destilation by Garbage Collection [2025-03-02 So]
55. Reidentifying the Core Problem [2025-03-02 So] reorientation
56. How to make a model of the world [2025-03-02 So] with_negar
- 56.1. Block world
  - 56.1.1. Interactable
  - 56.1.2. Algorithm
57. World Modeling Subproblems
- 57.1. ground-truth-inference problem group
- 57.2. efficient-model-derivation problem group
58. Detecting Structure in a Simple Cellular Automata with_negar
59. [2025-04-15 Di]
- 59.1. Notes
- 59.2. Ideas
- 59.3. Tasks
60. Cellular Automata Investigation

1. The Overall Story 🔗

[2024-08-12 Mo]–[2024-09-05 Do] The headings from from 2024-08-12 Thomas to 2024-09-05 - Matthias G. Mayer - Propositional Logic Function presents a chronologicaly sorted account of research sessions done. This document is written to facilitate the process of understanding. Content might be wrong, unimportant, and defnetly unpolished. Notation used is potentially explained in the Phase I Destillation (a typst document) without giving explicit refference.

[2024-10-29 Di] In A Simple World-Model Model we started to think from scratch about how a world model should look like. We create a simple formalism.

We then think about the problem of how, given that we have an almost correct world model, how we could go about fixing it. This problem seems important, because it seems much easier than constructing a world model from scratch, while solving still seems to require some breakthroughs in how to think about world models.

[2024-10-31 Do] Then in Program Search is the Core we notice that in a world model we have functions/programs. E.g. the <update function/physics> in our world model seems to be most naturally implemented as a program. Only much later in Taking a step back and understanding Program Search do we manage to realize why a program seems good. It seems to be a lot about how to get the right inductive bias, though we are probably still confused.

[2024-11-02 Sa] We go on and work on solving program search for NAND circuits. The motivation here is that NAND circuits are very simple (only one componend), yet powerful (can represent any function from a from a fixed size bitstring). We quickly get distracted by an idea about making Continuous NAND Circuits and circuits and optimizing them with SGD, before validating the basic approach. We get a basic version that is able to correct learn AND and OR from a 2 layer NAND circuit.

[2024-11-08 Fr] For several days we have failed to run basic tests on larger NAND circuits. Instead we work on random ideas about how to make the setup better, especially how to generate the NAND circuits in the best way. When we finally perform these tests the setup does not work, probably because there are local optima, but we haven't verified this.

[2024-11-09 Sa] We realize that we are doing research kind of wrong. We are way too deep in the weeds, while not having good models, and having made good arguments in these models for why these particular weeds are good to scythe. See Aborting Continuous NAND circuits and Research as a Stochhastic Decision Process.

In Taking a step back and understanding Program Search we take a step back and think about how to do program search more generally again.

[2024-11-10 So] We now take a step back even further, and realize that actually we don't have good models about very basic things like what we mean with program search, and even what we mean with program. In Factoring the Update Function of a World Model we make important conceptual progress.

[2024-11-13 Mi] In A World to learn a Factored Update Function in we consider what would make a good toy world for learning a factored Update function. We choose cellular automata, and start a basic implementation.

[2024-11-14 Do] I decide that learning miniKanren is very important, and spend the day on it. Output and reasoning here.

[2024-11-16 Sa]

Use chat to figure out why chat is good (see The Chat Medium).
Implement The Unifier
Work on the NAND Interpreter (e.g. 1 variable version)

[2024-11-17 So] We looked at baliman (see Baliman).

[2024-12-17 Di] For one month I mainly did skillup in programming. I did some advent of code, bit-operations, Lisp GameEngine investigation, WebGPU and more.

[2024-12-29 So] I spend a week learning computer graphics and basic webdev. The goal is to make it easy to create interactables to facilitate the research process. See the Learning Morphology Through Play project.

2. Typst: The Science Algorithm 🔗

2.1. About this Document 🔗

3. 2024-08-12 Thomas 🔗

3.1. Plainly coded AI 🔗

It seems that when an AI could build a full AGI then it could also kill us if unaligned. We need to get useful cognition out before that. World Substates VS Transition structure In Conway's Game of Life, Gliders are important. In Minecraft, wood is important.

Glider instances and wood instances are substates of the world state.

This is related to the difference between a static instance of a datastructure which just contains some data and the algorithm that operates on instances of this datastructure.

E.g. if the world is a bitstring B, then a substring of B is a substate of the world state.

3.2. World Model Specification 🔗

In Conway's Game of Life, Gliders are important. In Minecraft, wood is important. Hypothesis: We as humans have a specific cognitive architecture that allows us to represent concepts specifically a glider would be saved as a specific kind of set of substates all the states that the glider can be in.

We as humans have a cognitive architecture where we can refer to the general idea of a glider somehow; an algorithm should be able to do the same thing.

We know that they are. Therefore we know that in a cognitive system that models the game of life or minecraft, it

The cognitive system has a list of concepts; one of them should be glider/wood.

We could do the same for the transition structure.

3.3. World Program 🔗

WorldState ; A state of the world
Substate ; Substate of the world

structure Concept where
  substates : Set Substate
  

tree := Concept { substates := ALL_TREE_BLOCK_CONFIGURATIONS }

Object-based World Program Instead of using the world state graph, we can have the world model consist of a collection of objects. Each object is a collection of sub-states together with a map describing the transition rules between the substates, together with some instance properties.

E.g. here would be the Minecraft tree instance:

Concept { 
  substates := ALL_TREE_BLOCK_CONFIGURATIONS,
  infraTransitionRules := []
  instanceProperties := { position := SOME_VEC3 }
}

Example

A world state is a collection of objects.

WorldState ; A state of the world
Substate ; Substate of the world

structure Concept where
  substates : Set Substate
  


tree := Concept { substates := ALL_TREE_BLOCK_CONFIGURATIONS }

GliderPhase = enum("GliderPhase", "ONE TWO THREE FOUR")
Direction = enum("Direction", "BOTRI TOPRI BOTLE TOPLE")

class Glider:
    pos: (int, int)
    phase: GliderPhase
    dir: Direction

    def evolve(self):
        if self.dir is Direction BOTRI:
            self.pos[0] += 1
           self.pos[1] += 1
        self.phase.next()

structure Glider where
  substates
  pos
  updateMap : Glider -> Glider

def Glider.update (g : Glider) : Glider :=
  g.updateMap g

3.4. How can we handle inter object interactions? 🔗

In minecraft TNT destroys blocks around it when it blows up. Fire ignites wood blocks that are close by. In both cases the exact effect depends on the surrounding blocks. Dirt has lower TNT resistance than stone, and only burnable blocks will be ignited.

Often interactions are localized like the above two examples.

3.5. Good properties 🔗

The rules of the world are encoded in the objects, which intuitively is “smaller” than describing the interactions with a world state graph.

3.6. Notes 🔗

The world state graph seems to be isomorphic to a lookup table. (where the key is the tuple (world state, action))

3.7. TODO X 🔗

How do substates relate to

3.8. Next Steps 🔗

Distill the core insights into typst
Completely formalize objects in the 1D Bit Automata.
- Understand at all how different objects would interact with each other.
  - One part of this is probably collision detection
- Formalize how objects can be combined
  - algebra of objects. E.g. type theory has an algebra on types. Product type is an operation that takes in two types and returns a new type.
  - Generally what I mean with algebra is that I have a set O (the objects) and a set of functions Oⁿ -> O where n can be anything and is not fixed, that can be applied to the objects that are closed over O. And usually it's implied that we can discover rules that hold
  - We want to be able to combine different objects, automatically, such that all attributes of the original object get combined in a meaningful way.
    - E.g. if I know what it means for a glider to collide with something I want to be able to create a glider formation object of multiple gliders that has automatically defined the “is colliding” functionality.
How does this relate to algebraic structures
How does this relate to the algebra over programs (FP, John Bacus lecture, etc.)
- With an algebra over programs could you solve
How does functional analysis relate? Seems very related.
Can you do something like construct your program in a specific algebra of programs?
Human reasoning seems to use some algebra of concepts. E.g. I can talk about a milk bottle that has two openings connected to different internal components.
Concrete task:
- Read Bakus lecture on FP
- Look at Gathman script
- Create a runable program that defines the 1D Bit Automato at the low level, purely in terms of objects.
- Define the functions that can meaningfully combine any of the low level objects
  - This algebra should be complete, in that any object that can exist is constructbile from some set of atomic concepts and object operations in the algebraic structure.
- Learn the very basics of smalltalk to evaluate if using smalltalk would make programming so much easier that the switch is worth it.

4. 2024-08-13 - IA - Object World Network 🔗

4.1. About the Name 🔗

The name Object-to-Word-Network is interesting because it makes your brain conceptualize the object relationship as a network, without me already having a good idea of how to model object interactions as a graph.

4.2. Object Communication 🔗

How do objects communicate with each other? Each object could send messages to adjacent objects. Each object could know how it interacts with other objects. Specifically, each object could define an interface, meaning a shape that this object knows how to react to, if another object that it interacts with has this particular shape. E.g. in Minecraft a stone can know how much force is required to break it. A pickaxe then asserts force onto the stone, but so does TNT. The stone need not know if a pickaxe or TNT breaks it, just how much force is applied to it. Generally, programming concepts like interfaces, smalltalk messages, network protocols, and so on might provide a useful conceptual framework for thinking about object interactions.

4.3. What are Objects 🔗

We define objects indirectly. We want to find a specific structure that perfectly models the world. This structure can be a set of objects \(O\), together with an update function. The update function is hardcoded to simply call the update function on each element in \(O\). \(O\) contains what we call objects.

An object only needs an update function. That is the only requirement. However, each update will usually query \(O\) (\(O\) is read only) to know how it should update itself.

structure Object :=
  ObjectStructure : Type
  -- update the state of the object
  update : Object → Set Object → Object 
  -- Calling the objects update function gives correct result for the current context?
  updateValid : Set Object → Bool 
  -- Deconstruct the object by constructing the concrete object that it is composed of.
  downTranslate : Object → Set Object

5. 2024-08-13 Gurkenglas 🔗

Possible Topics:

Why is Tacid programming good?
- Concrete examples.
How to model the world with only objects.
Why is coljure Spec good.

5.1. What's the point of pointless style 🔗

incList = map (+ 1)
incList' x = map (+ 1) x
incList'' x = map (\y -> y + 1) x

foo a b c d e = e b d a c
foo = ((flip . (flip .)) .) . flip (flip . (flip .) . flip . flip id)

foo a b = b c
foo = ((flip . (flip .)) .) . flip (flip . (flip .) . flip . flip id)

foo a b = [b, a]
foo = flip (:) . return

There are factorial many permutations of the input arguments, meaning that there are

Point free style is good in practice. ~Gurkenglas

Function application is

Point free style is good because it reduces the size of the search space.
How to do program search by algebraic manipulation on programs.

pointfree.io

You could prove whether the goaldbach conjucture is true by being able to calculate if it is \(O(1)\) or \(O(n)\) for doing proof search.

5.1.1. Point free style support 🔗

You can write things in point free style in Lean.

5.2. Things to do: 🔗

Unedrstand point free style.
- Undestand how to do algebra with programs.
Understand SKI Combinators
- I think their purpose is to be simple to reason about formally. That's probably (ask GPT)

5.3. Algebraic Programming 🔗

Consider that we have the program \(\left( (f \circ g) \circ h \right)\) where \(f\), \(g\), and \(h\) are programs and \(⚬\) is the function composition combinator. We can algebraically transform this program: \[\left( (f \circ g) \circ h \right) = \left( f \circ (g \circ h) \right)\because\text{ composition is associative }\]

In general if we prove some identity like \(h = f⚬g\) for some functions \(f\), and \(g\), then in any expression in the current context we can substitute \(f⚬g\) with \(h\). In this case we can transfer properties of \(h\) (i.e. knowledge about \(h\)) to reason about \(f⚬g\).

The above is a trivial example, of iteratively rewriting a program in a way that preserves functional equality. I expect there are non trival ones. Things that seems possible:

Rewrite the code such that it does the same but is more performant (probably compilers to something similar).
Iteratively construct a program according to a target specification.

One feature of this style of reasoning is that we don't need to quantify over arguments. I intuit that this is an important property. Instead of needing to reason on the argument level, where would have potentially properties like \(\forall x \in A,f(x) = g(x)\) we instead reason about what properties a combinator expression has, probably usually based on what properties the operates and operants have.

Somehow it seems that the size of possible combinator combinations is smaller than the size of all functions and arguments. After all for every combinator combination we could expand it such that it does talk about it's arguments. But then there is just more stuff to reason about. It seems that we can “compress” our reasoning into the space where we only consider properties of programs as a whole, and properties of combinators.

5.4. Next Steps 🔗

Possible:

Find concrete algebraic program manipulations, useful for constructing programs.
- Something like “my function needs property P and using a combinator would make it have this because it has property set X and this combinator application has property Y that tellms me what properties the resulting program will have, based on the combinator and the properties of the combined progarms”.
Develop an algebra of objects.

6. 2024-08-14 🔗

6.1. Algebra 🔗

In this document we use the term Algebra as used in Universal algebra: A set A together with a collection of operations on A.

6.2. Ideas 🔗

Object World Network

The algebra of objects could be such that

Algebra of world descriptions

Formalism/Algebra that allows transformation and combination of different world descriptions, that guarantee that any resulting world description still perfectly describes the world.
- This could go so far that world descriptions can be structureally dissimilar and than the structures themselfs can be combined.
- Objects are part of the world descriptions so an algebra on them is useful.

7. 2024-08-14 - Object Interactions 🔗

7.1. Cartesian Meta Objects 🔗

Objects could model themselfs. Specifically when we run the simulation of the world, each object observes how it interacts with other objects. And this self knowledge and self modeling would transcende specific object instance. A glider in the game of life could know that if it flies into a wall, it will die. But this knowing is performed by a cartesian system that observes this particular glider and it's disintegration (or potentially even multiple gliders). So there is a sort of Meta Glider entity that can get to know gliders that looks at a glider from a cartesian prespective. This is valid if we are talking about a simulation of the world, because there cartesianess can be easily enforced.

7.2. Other Ideas 🔗

Bacteria under microscope.
- Already know that the object is a good component of the world model.

This is about trying to make the perfect physics simulation that I already have, be faster by building more abstract world models (objects that interact with each other).

Another thing you need to figure out

Try all possible algebraic manipulations to get an object.

Object World Networks are about

7.3. Start with Lossless Computational World Model Compression 🔗

Imagine you have a perfect model of the world that is though too computationally inefficient. I think we want to optimize this model first for lossless computational compression, i.e. making the model run with less compute but still being perfectly accurate. This makes it easier to evaluate if you're going in the right direction. Simply check if your model is still perfect and how much faster it got.

Once you try to find a lossy compression, you need to now be able to properly evaluate whether the trade-off of losing accuracy in the overt model is worth the efficiency gain in running the algorithm.

8. 2024-08-15 - Julia 🔗

8.1. Scratchpad Julia 🔗

brainstorm

what johannes doing
what julia doing
overlapping intersts
conflicting models

figure out alignment ??? trying to think about alignment problem from scratch-ish

current leaf nodes / threads of thinking

homo sapiens cultural foom transition
is there an “intelligence core” or “generality-binary”
causal&telic tags which allow shape for “agency”

some general topics of interest

corrigb/friendliness, attractor
“intelligence introspecton”
abstractions

predictions: world state graph, world object networks, etcetc downstream of “intelligence core”

whats your process: different answers

8.2. Johannes Scratchpad 🔗

8.2.1. What Johannes works on 🔗

World state graph
- Regions/Properties
- Onions
Layered Abstraction
Science Game
World object networks
Interaction Nets

conversation meta

“people dont do enough math”
“ideas are not pushed to the place where they need to be pushed in terms of formality”
“too fast math, too formal”

Julia (notes by Johannes)

Culture can be though of as creating a database of knowledge that can be pass though generations, being continuously updated such that people in later generations are in a context where they have more knowledge than their parents and therefore they could build upon that knowledge.
Cultural evolution is faster and can give qualitatively different things (e.g. a computer) from what natural evolution can achive.
Thinking about this is an example about how optimization in a system changes drastically.

More julia notes: the idea of a “intelligence core” is quite interesting.

Whats the difference between human and monkey? is the human very much more “intelligent” than monkey? I predict on most internal aspects human & monkey are same levels of capabilities. I think difference is probably quite localizable? might give insights about minds/intelligence.

“why is the idea of an intelligence core relevant” loadbearing on timelines exploring the idea of “optimization context changes”. forward-chainy relevant for alignment. loadbearing on the “shape” of the AI which wins loadbearing on what is alignment

Johannes's Questions

Go through everything you thought about at a higher level to get an overview of what all the things were that you thought about.
What is the thing that saves the world to do?
Why is the idea of an intelligence core relevant?
Why is this overall relevant.

8.3. The core of efficient reasoning 🔗

BFS finds a path in any graph. In general brute force search can do optimal reasoning.

Therefore the core of intelligence isn't so much about how to be able to reason about the world, but how to do so efficiently.

(There are still some other issues though like embeded agency.)

8.4. Abstractions 🔗

Abstractions are important.
- Abstractions need to capture what you care to model at sufficient detail, such that you can perform the computations that you acre about (e.g. simulating the world) at a sufficient level of accuracy.
- you need to be able to iteratively update your abstractions.

8.5. Human Clock Ontology is unbroken by quantum Mechanics 🔗

Human can handle ontology shifts well because they don't even talk about certain things. For example, when I think about a clock, I would think about it in terms of gears. So if I now realize that atoms are not real and there is only a quantum wave function, it wouldn't change how I think about a clock because I'm already thinking about it in a different conceptual framework without referencing atoms.

Looking at software engineering, we can see a bunch of different strategies for making a system robust to changes.

Git version your system and have tests and automatically roll back to the last version the tests they're passing if the current version breaks.

8.6. Self Culture 🔗

Think something.
- This might involve drawing on a whiteboard, speaking, writing in a text document, writing down math, writing down pseudocode, doing interactive programming, etc.
Compress your thoughts.
Replace the original thought with the compressed version such that you have a larger capacity for new thoughts.

8.7. Algebraic concetps 🔗

Somebody can tell you how to c
GPT is trained and now its knowledge is fixed. Actually it can learn new things during a “context window”, but this will be forgotten afterwards. Even if GPT says that it learned something new, it hasn't in the long run. And just having this one convesatino be in the new training dataset would probably not be enough data to make GPT get this right in general.

8.8. LLMs Don't have online learning 🔗

LLMs can't observe some data and then update their ontology without destroying it, using only few datapoints.

8.9. Language Forces you to become smarter 🔗

The weapon damage of speech scales with intelligence, not with strength.

Can you have an intelligence without language? A dictionary allows you to have short names that map to larger datastructures. This seems to be a generally useful thing. So something like this would show up in the internal structure of an AGI.

Inventing names is relly useful. It's underused. When you actually need to explain something you need to recursively unroll all the concepts, and give them names to iteratively build up the concepts that you want to communicate until you finally have enough concepts in the other persons brain.

9. 2024-08-16 - Juila 🔗

concrete problem

brainstorm

causal dag + teleology. agent: (A->B)->A. abstraction structure.
corrigb
more philosophy regarding “intelligence core”

method of “following the causality of belief”

-— corrigbility

9.1. Optimization atractor state 🔗

How can we create a system that optimizes in the direction we want?

What is a goal directed agent

9.1.1. TODO [POST 🔗

] Humans have the Alignment circuitry I can predict what an alien creature would want on another planet. They don't want me to send a nuclear missile their way. Probably they want me to beam some technology information to them. Maybe, or maybe not, they want me to conquer the universe for them and kill everybody else.

I can imagine what I would do if I would be a psychopath that really likes killing people. I can start to plan what is the best way to kill the most people without getting caught. It's quite scarry how easy it is to do this.

That is the power of consequentialist reasoning.

Above what we did is set our goal to what the aliens want

9.1.2. What is an Atractor 🔗

We can assume that we know where we are, where we can go, and then try to figure out where need to gow (uncertainty over goal).

9.1.3. Misc 🔗

Miopia is about constraining the reasoning of an usaligned system to not be dangerous.
- This seems different from the previous thread of figuring out how can the system infer the right goal.

9.1.3.1. Good for the User 🔗

Strictly more states can be reached.

9.1.3.2. Consequentialist reasnoing 🔗

This is what we want the AI to do. It's not neccesairly that we want the AI to steer the world. The AI could be miopic in the sense that it talks to the user but does not think about how it affects the real world through talking to the user.

9.2. (Julia) motivating intuitions 🔗

Fragility of extreme states ‘

If you have a ‘goal directed agent” and want to do X, aiming it at \(X + \varepsilon\) is usually not good enough. Why?
- Depends on metric. Which metric causes this?
- Ontology issues w this formulation.
If you have some variables \(\left( x_{i}\sim X_{i} \right)_{I}\), optimizing \(\left( x_{j}\sim X_{j} \right)_{J}\) pushes \(\left( x_{i} \right)_{\left\{ I - J \right\}}\) into “extreme” (low probability?) values. Why?
- Okay I have some semi-formal intuitions could write down if load-bearing. Variables are related & optimizing x_i pushes x_i inte extreme which forces x_j into extreme.

Extreme values are reached only for variables not specified in the goal (implicitly it means they can be any value). If I only add a small

Intuitions

self awareness of “flaws”. the thing knows it was initialized close to. but not at an attractor fix-point. it will try to reach this point
- property X (X is the optimal algorithm/goal): ‘I” ‘probably” am close to X, but not X. I will act with this in mind, and avoid making ‘mistakes” which might be due to my distance from X.
something which this is not: ‘uncertainty over the utility function”
- you dont want the AI to open up the human or manipulate the human to give G, and then the AI optimizes for G.
apparant paradox: intuitivly we do not desire our corrigible agent to ‘want” anything. But we desire it to ‘want” to be corrigible.
- perhaps we do not desire out AI to ‘want” to be more corrigible. simply to ‘assist” the ‘user” in making itself more so.
  - How could one get around the standard notion of instrumental goal conservation?
We want our ‘user” to be ‘better off” for having the AI assist it, rather than not.
the AI needs to employ internal consequentialist reasoning. but this needs to be careful not to ‘optimize” in a way which could cause ‘mistakes” to propagate.
have some ‘user'. want the AI to try and ‘empower” ‘user'.
- dont want the AI to ‘steer” through ‘manipulating” ‘user”

10. 2024-08-16 - Yannick 🔗

Typos are distracting and make it hard to read the text (for Yannick).
- Doing at least a rough spell check pass before showing something to Yannick seems good if possible (but it should not stop me from showing something).
It was quite fun and engaging having Yannick read the text out loud and comment on it.

11. 2024-08-18 - Julia 🔗

abstraction intial thoughts/queries/intuitions

we require abstraction to operate, if we are bounded minds.
abstraction requires 'structure' in the world. there are worlds 'too entropic' (or 'adversarial') where abstractions dont work well. these worlds probably dont have minds.
abstractions (in humans) are (probably) acquired empirically anchor w a bias.
- I dont think human algo will have 'justified reason'(proof?) to pick up the abstractions it does.
  - Motivated by natural-selection intuitions. Natural selection finds a way for humans to pick up abstractions empirically.
- genetics encode learning biases.
  - what to learn (faces, language, spiders etc)
  - HOW to learn. What does a good abstraction look like.
humans dont have ontology shifts, we have ontology additions?
- we have overlapping abstractions

how to we obtain our abstractions? what do we mean by "relevant". the things we "care about" are usually other abstractions.

how do we obtain abstractions for high actuation? What does it look like for an abstraction to be for high actuation?

predict
say novel things about our other abstractions. relations towards other abstractions.

api-thing. say “move arm left” and it does the rest. “move arm left” is an abstraction. its a “high level” abstraction. what does it mean for an abstraction to be “high level”. “move arm left” uses the “turn join” abstraction. in order to have “pure level” it needs to be a tree, but its not a tree. measures are in relation to something.

I want the things “I care about” to have short description lengths.

I want “any possible goal” to have short description lengths.

What does it mean for use to have “control” over a space?

How would the human algorithm decide what abstractions give control? Empirical generalization? Something else?

How do you tell apart the concept of “blbl” and “yellow” How do I relate this to things I already know? forward chaining metric

Evaluate question: What can I do that I couldn't do before.

“what can I do” is defined in abstractions.

example: abstraction of Newtonian mechanics. allows me to (on a non-windy day) figure out where cannonball lands (without first shooting). previously i had to shoot the cannon at different angles with different amounts of gunpowder and make a chart. with Newtonian mechanics I can like shoot 4 times, and then figure out the rest of the chart.

we have abstraction of:

shooting at target X

but somehow we “dont know” “how” to shoot at target X. given “Newtonian mechanics” we now “predict” that we can shoot at target X

what grounds our “prediction”

if I give you “bullshit mechanics”. why do you “not predict” that you can shoot at target X.

well I might have some trust in “NEMech” (based on empirical experiments). {where does this trust come from?}. I have no such trust in “BSMech”. ‘TRUST MAGIC” vouches for “NEMech”. We trust ‘TRUST MAGIC'. Therefore we trust “NEMech”.

assuming “NEMech” (or “BSMech”) is true, it allows us to shoot at target X. This is high actuation. We have lots of control over our space. How is this determined? Our current abstractions have a notion of “shoot at X forall X”. our current abstractions vouches that “NEMech” makes the claim “shoot at X forall X”. therefore its useful. Why is “shoot at X forall X” evaluated greater actuation than “shoot at X, forexists X” “NMech” doesnt give me access to “shoot X forallX”. it gives me “cheaper method for figuring out how to shoot at X forall X”

how do we evaluate “good abstractions”?

conjecture: abstractions are only meaningful in relation to other abstractions. Its similar to currency. There is no “true objective” value of currency, you have to know what relations other people have to the currency. There is no “true objective” value of an abstraction, you have to know what relations other abstractions have to it.

maybe when an abstraction gets used, it gets reinforced? IE when “shoot ball at X” abstraction uses the “NMech” abstraction, the “NMech” abstraction gets reinforced. this can happen in a simulation.

imagine you have a “thingymajig” (uve never seen it before) on a table. you naturally generate the “thingymajig” abstraction, but not the “thingymajig-table” abstraction or the “thingymajig-air” abstraction or the “left-thingymajig” & “right-thingymajig” abstraction.

the “thingymajig” abstraction interacts nicely with your other abstractions (since it is solid in this case). It interacts ‘simply” with the “pick up object” abstraction. Meanwhile the “pick up object” abstractions interacts in a ‘complex” way with the “thingymajig-table”. the “pick up object” abstraction interacts somewhat-fine (why is it so heavy???) with the “left-thingymajig” abstraction, but this might not be the best way to cut up the space, since we can have a larger abstraction (the whole thingymajig) which covers more “stuff” without complicating our interactions with other abstractions.

what does it mean for abstractions to interact ‘simply” or ‘nicely” instead of ‘complexly” ?

Johannes

Just as the human doesn't know how does the flinching algorithm work, why is it good, why is this the right thing to do, you can develop the reasoning algorithm where the human then also doesn't understand why is this good, why is this the right kind of reasoning algorithm to do that actually works because it was just iteratively refined and built by evolution.
Humans learn knowledge, it's not encoded in genetics.
Thoughts you can have are limited and derived from the sensory inputs that we have.
Humans are not super general in the sense that they couldn't imagine a 4D environment as well as a native 4D inhabitant. Similarly for people who can't recognize faces.
Humans are resistant to ontology shifts also because they can't just rewrite their very basic cognitive architecture.

11.1. Combining Object Instances 🔗

Objects can be modeled as a map of properties like

structure Car :=
  Make : String
  Model : String
  Built : Datetime
  Color : Color

together with combinator logic, i.e. a definition of how the object combines with other objects. For example, yellow can be an object of:

(def-abstraction Yellow
  (properties {:color #FFFF00})
  (combine-with (abstraction?) 
                (set-if-exists (color abstraction) (:color Yellow))))

such that we can say Yellow + Car {color : black, …}= to get a description of a yellow car.

11.2. Combining Algorithmic Abstractions - Nutonian physics and Wind 🔗

Can we combine the abstractions Wind and Newtonian physics?

Newtonian physics can be represented by a program P : WorldState → WorldState. Multiple ways of combination exist:

Let P : List Parameters → WorldState → WorldState while wind is:

structure Wind :=
  strength : ℝ
  direction : Vec3

such that a Wind instance is passed to P. Now P needs to know how to use P correctly.

We could think of P and Wind as ECTS systems. P updating arbitrary values, while Wind updates only the velocity component. Both systems can be computed independently and then combined by vector addition of the velocity components.

Here we are effectively adding two programs by combining their results (which is easy because the result datatype already has an algebraic structure).

Add programs directly. This seems very hard.

11.3. Examples of high actionation Spaces 🔗

My minds eye is a high actuation space. What makes it so?

I have concepts like shark that have visuals associtaed with them. I have yellow that has visuals associated with them. I can add these concepts to get a new concept with different visuals, i.e. I can comibne concepts to very quickly build up an almost infinite variety of visuals.

11.4. Money is useful. How do I know. 🔗

In the past I bought things. If I want something in the future I don't want now, I could use money to buy it.

Programming, physics and math are also generally useful.

To determine if a concepts is useful you can check how it combines with your existing concepts. Computercraft.

In Lean, every definition has a type. To check whether in a theorem there is a step I need to perform that I already know how to perform, I can simply search through all the types in the library. We can index the types, such that this is fast. For concepts, we can imagine a similar thing, where we look at the shape of the concept and match it to concepts we already know.

The concept yellow, which has a color property that it sets on an object, can interact with any object that has a color. If you have many objects that have the color this concept is good.

Now need different mechanisms

evaluate if the combinations are useful
generate the concepts in the first place.

11.5. Evaluating with simulations 🔗

Newtonian mechanics allows us to simulate the world, and then evaluate the results. New concepts could be evaluated in simulation.

Even if you want general knowledge, you might want to set yourself specific targets. Targets where you expect if you'd solved them, you would need to have understood something new and important. And then evaluation in a simulation becomes easier because you know where you're trying to get.

Ask the question, what is a thing I cannot do right now, at all, is it because it is physically impossible or is it because I don't have certain knowledge? If it is because I don't have certain knowledge, then try to obtain that knowledge.

This allows you to understand if you have learned something.

Specifically we start in the situation where we can build using our world model target states to reach that we don't know how to reach. We then try to optimize for reaching this state, gathering the required knowledge along the way.

Iterating this process with random goals would make us smarter and smarter.

How can we determine a target that is not too hard?

11.6. Alternative Evaluation 🔗

You can evaluate if flying would be a useful ability:

Brain:

What is flying about? It's about moving my body around.
Do I already move my body around? Yes, e.g. to go to the supermarket.
Would flying be better than my current way to get to the supermarket? Yes, assumeing I can fly fast.

Brain: Preliminary analysis indicates flying would be a good property to obtain.

The brain could think about it longer, discovering more advantages, disadvantages, and other facts like needing pilot goggles. This tends to increase the probability that the analysis is correct. The brain employs an anytime algorithm.

The main impressive thing is how fast the preliminary analysis completes.

11.7. How do you learn 🔗

Setup:

You have a specific goal just outside of what you know how to do.

At what point does the learning happen? I could:

Using the current world model try to find a plan to reach the target.
I will get stuck at some point.
What is the shape of the hole in my mind?
Do wishful thinking: What action if I could do it would allow me to get to the goal
- E.g. if I had a sticks I could make an iron pickaxe, but I don't know how to get stick. How to get sticks is the knowledge bottleneck.
- We generated a subgoal, i.e. an even narrower target.

11.7.1. Multilayer search 🔗

Setup:

The system should contain multiple planning algorithms.
- The lowest level is exhaustive search over atomic concepts.
- There are multiple levels of abstractions.

Our overall planning algorithm tries to plan at a high level as much as possible, but when it's not possible to find a solution at the current level of abstraction, we drop down to a lower level of abstraction. Ultimately we might end up using the exhausive search over at atomic concepts, which should be guaranteed to always find a solution, Though likely very inefficiently.

Each time we drop down to a lower level of abstraction and found a solution, we try to distill the knowledge required for the solution into the higher level of abstraction such that in the future we can plan more efficiently.

11.7.2. Distilling Concepts 🔗

Once we have generated plans that reach the state we don't know how to reach before, we want to be able to take the new insights from generating a plan and update our work model based on them such that we can plan more efficiently in the future.

The low-level plan implies a hole in the higher-level conceptual space.

11.8. Cannonball buttons 🔗

If you don't have a model you can experically probe the world.
If you have a model you can calculate what to do without probing the world (probing is expensive).

Imagine you want to shoot the cannonball. You could set up three locations and through trial and error find the exact settings that make the cannonball fly to these locations. Alternatively, you could know Newtonian mechanics which would allow you to compute how to aim the cannon such that the cannonball lands anywhere.

Adding buttons to fire at the tried-and-errored locations would make shooting there very easy. Calculating Newtonian mechanics still is hard. Finding Newtonian mechanics though makes the adding of new buttons easy.

11.9. General Algorithm for Efficiency Gain 🔗

We can write a logical predicate selecting desired elements to perform exhaustive search.

A general way to make this more efficient is to try to prune as many possibilities as soon as possible. The human brain doesn't even generate extremely stupid plans, Like jumping from the third floor to get down to the street, even though it is faster.

Let \(W\) be a set of world states. Let \(P\) be a predicate selecting desired world stated.

11.10. Concept Desiderata 🔗

Two targets:

Concepts should be few. Therefore having larger objects like table instead of have this table plate with these legs is better.
Concepts shouldn't break easily. A plate on a table is a concept breaking when I take away the plate.

12. 2024-08-19 - Maren - Zendo 🔗

See also how to turn a potato into the sun in phase-I-destillation.

Assume we are playing BitStringSender, meaning you need to guess the predicate function over all bitstrings.

12.1. Zendo Strategies 🔗

be minimal.
Reuse your exsiting data
Once you have a guess, generate positive and negative examples to verify it.
Generate examples where you're maximally uncertain about the outcome. Heuristic for most relevevant evidence.
Side channels
- The time to compute for the master, the predicate, can provide information.
Find an example such that a perturbation to this example, as small as possible, changes the truth value. E.g. append a single 0.
predict every outcome.
You need to try to falsify your predictions.
- Otherwise, through sheer accident, there might be a pattern you spotted in the data you generated and verifying that pattern would always hold true, maybe, for example, if just any string was true, meaning you have a too specific theory that you started with and never discarded because you didn't try to falsify it.

12.1.1. Verification strategy 🔗

Strategy: Generate random bit strings and then use a minimal transformation on that bistring, such that according to your current best guess property it will be false, and another modification such that it will be true.

Strategy: generate pairs of bit strings that are very close in, for example, only one bit is different, such that one of the bit strings makes the property true and the other makes it false. Check with these bit strings if your theory is correct.

12.1.2. Shared Properties of Categories 🔗

12.1.3. Minimize the Context you consider 🔗

Continually try to simplify patterns.

12.2. Def: Boundary Set 🔗

When playing Zendo with bitstrings of length \(n\). Consider the subset of 2-tuples of bistrings \(B \subseteq \left( {\mathbb{B}}^{n} \right)^{2}\) such that \[\forall(a,b) \in B,\exists i \in \left\{ 1,2,\ldots,n \right\},\text{ flip-bit}(a,i) = b \land P(a) \land \neg P(b)\]

meaning, the 2-tuples describe a boundary. The first entry in the tuple is a point for which \(P\) still holds, while the second is a “close by” point for which \(P\) doesn't hold.

Assuming we have this boundary set. How can we generate a hypothesis from this set? I intuit it's helpful.

13. 2024-08-19 - Thomas - Reflection 🔗

13.1. What we did 🔗

We tested an intuition by turning it into an algorithm and then empirically evaluating whether that algorithm is capable of gathering, given a predicate, useful examples with which you can infer what the predicate is in bit strings and all.

Using the algorithm, Thomas figured out in less than 10 minutes the predicate that the first two things in a string need to be the same and the last two also need to be the same, whereas I took over an hour.

13.2. Reflection 🔗

This seemed very good, we never so far managed to get to the point of writing code such that we have some concrete outcome like empirical evidence that some algorithm is actually probably good.

13.2.1. Why did it work 🔗

I had some pretty good idea and thoughts about what might be important to look at, specifically that somehow finding boundary points is probably important.

It was helpful to first play the game ourselves and choose the BitString Zendo game which is easy to code. And it was probably useful to do interactive programming in Clojure that's easy to try out things.

The project was not so big that it was impossible to finish in one meeting (meaning on the small thing we did).

14. 2024-08-20 - Julia 🔗

14.1. Why is Zendo important 🔗

Solving Zendo requires to know:

How to form hypotheses?
How can we ask the right question of the environment that give us maximum information (minimizing the amount of evidence we need).
- What is good evidence?
How to efficiently verify a hypothesis?

How to form hypotheses is the bottleneck.

how-to-turn-a-potato-into-the-sun describes a heuristic to obtain good observations. Having good observations could make it easier to find the hypothesis generation algorithm.

14.2. Julia's Algorithm 🔗

We give a simple solomonoff inductor style Zendo player. In addition to updating, it selects the bit string which would eliminate most hypotheses.

Let \(\mathcal{P}\) be it the set of programs with description length \(< n\). Let \[\forall p \in \mathcal{P},P(p): \propto \frac{1}{2^{2|p|}}\] meaning \(\mathcal{P}\) is distributed according to the universal prior.

Now, we want to find a bitstring that eliminates maximally many hypotheses:

A bitstring where the above sum is zero eliminates half the hypothesis space, in terms of probability mass.

To speed this up:

Terminate too long running programs.
Only sample a random subset from \(\mathcal{P}\).
Randomly sample bitstrings to try.
Terminate if you are close enough.
Use HVM SUP nodes.

Prototype: it should be possible to compute many uncertainty-reducing bit strings and by studying them reduce the uncertainty more. Imagine, you figured out some patterns in addition to just updating your probability distribution. For example, that the first bit is always 1 if the string is true. This might just be part of the rule and not the complete rule. But in this case, you could still determine that some particular strings your distribution is uncertain about are likely to be true or false.

15. 2024-08-20 - Gurkenglas 🔗

The rule where a single random bit string is one is easy to specify, but extremely hard to guess.
If you return some data structure, like a real number, instead of a boolean, usually it gives you more information, as long as the actual function is injective.

How can you iterate all programs that terminate? Which formalism is best?

16. 2024-08-21 - Programming Language Features 🔗

Homoiconic
Runtime Evaluator
Macros
Continuations (clojure lib)
Spec
Theroem Proving
Interactive Programming
- Hot reloading
- Partial evaluation (e.g. Lean4)
- Interactive visualisations (e.g. Unity)

17. 2024-08-21 - Mateusz Bagiński 🔗

You want to detect and model the structure that's in the world to make an algorithm that efficiently can reason about a world that has this particular kind of structure like a finite state automata and then be able to do things like infer how a world with this particular structure infer all the variables you don't know like what's the actual finite state automata that you have you only know that it is a finite state automata and do that efficiently.

Initially, we don't know anything about the world at all.
We want to observe the words, infer what kind of repeating patterns are there.
We want to model the physics of the world, things that are always true, and figure that out.
Imagine a Minecraft where there is a zombie. You can learn how to handle the zombie and fight it in general, such that next time you fight a zombie, you recognize the pattern and can reuse your knowledge.
Using the structural knowledge, you can design a bunch of algorithms and also using empirical observations to perform tasks like Given that I know I'm in a finite state automata right now, how do I efficiently query the world for observations such that I can increase my probability mass on the correct automata as fast as possible?

17.1. What am I interested in? 🔗

What is intelligence?
- How do you create a model of the world?
  - Given the world state graph, how do you extract the structure of the world?
- What is the datastructure that can represent a very complicated world well?
- What algorithms have the property that any AGI that you might look at would have some version of these algorithms?
  - creativity
  - Update the world database in Observation.
  - run simulations using a world model
How do you make a mind want to help you?
- Have your best interests in mind.
- Be wary of its own ignorance to not accidentally do something you don't like when it's uncertain what you won't like.
- Human can totally do this. I can imagine there is an alien somewhere and now I need to optimize for the alien.
  - I wouldn't start by shooting nuclear missiles at that planet, probably.
  - Sending them some technology blueprints might be nice though.
Goals
- How do you specify a goal that's not what you want and then have the agent still do what you want? (corrigibility)
- How do you make a goal such that a small perturbation of the goal causes only a small perturbation in the effect of what world state maximizes the goal?
Embedded agency

17.1.1. Meta 🔗

How do you do research?
- What's the best programming language?

18. 2024-08-21 - IA - FSA inference 🔗

See phase-I-destillation.

misc

These things compose into subproperties.

:todo: My brain can study finite-state automata. meaning I'm updating my world model with knowledge relevant to Finite State Automata. Using this knowledge allows me to reason better about finite state automata such that I can do things like write an algorithm which can correctly guess what the finite state automata is.

We want to find the algorithm \(W\) which can perform this kind of world-model learning. \(W\) initially knows nothing about FSA. It then builds a model \(M\) about FSAs, that enables efficient reasoning about FSA (in general not any specific FSA).

As \(W\) improves \(M\), \(W\) becomes better at playing FSA Zendo, especially in terms of resource usage. More generally \(W\) makes it easier to reason abitrarly about FSA. E.g. it could say how a FSA behaves on average.

(\(W\) can potentially be split into mulitple algorithms.)

There are multiple

Infer the model (“axioms”) of a finite state automata.
Discover “powerful theorems” about finite-state automata.
Use theorems to reason.

Sample a class of programs like FSA and then learn what pros that class has.

To make this more concrete, consider the concrete example of playing Zendo again. We could use our theorems to predict what observations would be good to make.

This seems related to Formalizing the presumption of independence (??, ????).

18.1. Reflection 🔗

It seems like I fail to consider and write down the motivations behind what I am doing.

I mainly follow my intuitions, but often when people ask me why I am doing something, I am able to produce an explanation. Generating this explanation often helps to clarify my understanding.

It seems good to be more conscious about and explicitly ask myself the question intermittently of why am I doing this, why is this a good idea, how does it look like if I succeed.

19. 2024-08-23 - Thomas - FSA inference 🔗

I read the definition of a FSA Wikipedia.
I play Zendo against a randomly sampled FSA, that takes in bit strings and returns 0 or 1.
I use the knowledge about how FSA works to interpret the return values and calculate what questions to ask.
- Before making any observation, notice that giving the FSA an empty input tells me whether the start state is accepting or rejecting. This rule doesn't tell me anything about a specific automata, the stats it could be rejecting or accepting, but rather tells me that if I look at the empty input's return value, it will tell me which of these is the case.

More formally: \[\forall f \in \text{ FSA},f\left( \lbrack\rbrack \right) = f.\text{states}\left\lbrack f.\text{start-state} \right\rbrack.\text{return-value }\] where \(f(x)\) means running the FSA on the input \(x\), while \(A.B\) accesses the field \(B\) on an object \(A\).

Note that this is an unconditional theorem, meaning it's a very general pice of knowledge as discussed in [the-generality-of-knowledge].

This inference can be made before any observations are made. It's a general rule true of finite state automata. There are many such rules you can figure out. Basically, take the definition of a finite state automata as the axioms and then prove all theorems (the above is one of these theorems).

We want an algorithm that can look at the world and study it and derive the knowledge that's generally useful for reasoning about the world. This is a very specialized instance of this problem.

People struggled in the past to build automatic mathematicians because it's hard to say what math is good. However, now we have a setup where we play a game such that you perform better in the game the better you are at finding good theorems. What's a good theorem is indirectly specified by the game and you can measure performance by looking at the game.

Actually, basically our overall problem here is what players face. When you first play a game, you don't know how it works. You study the basic rules of the game. Using your knowledge you build up a library of knowledge that is generally useful for handling a wide range of situations in the game. Usually you never encounter the exact same simulation twice, so your knowledge can't be hardcoded stimuli response patterns. Instead you need in addition to your factual knowledge, algorithm that can on the fly construct solutions to specific situation based on your knowledge.

19.1. The generality of Knowledge 🔗

Generally there are different kinds of knowledge:

Running away when low on health in LOL is a general good strategy.
As a Medic in battlefield you want to try to survive as long as possible, while reviving as many teammates as possible.
In Insurgency you might see somebody running to cover. They are now occluded, but you can calculate how to throw a grenade to blow them up.

Running away on low health is a good strategy that generalizes across most games. If you don't know what game you are playing, it's a good bet that doing this is good.

The medic strategy is only good in the context where you are actually a medic. When you fly around in a helicopter you can't can't even revive anybody. Part of the strategy becomes nonsensical.

Knowing how to exactly throw a grenade when you are at position A, and enemy is at position B, given the (potentially partly destroyed) terrain C, where A, B, and C have some concrete values, is only useful in exactly this situation. There are so many possibly values for A, B, and C, that it is infeasible to precompute how to throw a grenade in each possible situations.

Instead you need a model of grenades. This consists of knowledge: How do they bounce, how far can I throw them, how arced is their trajectory, how long until they explode, how much damage do they do at what ranges. We also need an algorithm that can based on this knowledge quickly compute the solution of how to exactly throw the grenade in every situation that is likely to be encountered.

Generally each pice of knowledge has a “usage context”. Meaning the knowledge is only true/useful if the context satisfies certain predicates (like conditional theorems in math).

19.2. Factual VS Procedural Knowledge 🔗

So generally our world model is structured to have factual knowledge, such as this particular object has these general properties, as well as algorithms that can efficiently calculate solutions to specific problems given the factual knowledge.

Roughly factual is just data (tree has the burnable attribute set to true). Note though that any factual knowledge could be represented by an algorithm and any algorithm could be unrolled into pure data (generate a lookup table).

Usually, procedural knowledge requires more compute, but less storage than factual knowledge, because solutions can/need be generated on the fly.

19.3. In summary 🔗

So how does learning about the world look like?

figure out the basic rules of the world.
Learn increasingly specific strategies that apply to specific situations.
- Don't learn strategies so specific that they only apply to a single situation, when there are many possible situations of that kind and you don't know in advance which one you would be in. Instead, construct knowledge and an algorithm that can dynamically generate a solution.

Knowledge/Strategies can be general.

19.4. A formalism of Knowledge 🔗

structure Knowledge :=
  conditional : Prop
  consequent : Prop

Knowledge is more general the smaller the conditional is.

Example: Linear search is more general than binary seach because it does not require the input list to be sorted.

Knowledge is procedural if the consequent makes statements about an algorithm.

Example: IF the list is sorted THEN binary search sorts it AND runs in \(O\left( \log n \right)\).

19.5. Zendo Player 2 🔗

\[\forall f \in \text{ FSA},f\left( \lbrack\rbrack \right) = f.\text{states}\left\lbrack f.\text{start-state} \right\rbrack.\text{return-value }\]

\[P_{1}(f) = \top,P_{2}(f) = \left( f\left( \lbrack\rbrack \right) = f.\text{states}\left\lbrack f.\text{start-state} \right\rbrack.\text{return-value} \right)\]

\[\forall f \in \text{ FSA},\forall x \in f.\text{return-values},f\left( \lbrack\rbrack \right) = x \Leftrightarrow f.\text{states}\left\lbrack f.\text{start-state} \right\rbrack.\text{return-value } = x\]

\[\forall f \in \text{ FSA},\forall x \in f.\text{return-values }:P_{1}(f,x) \Rightarrow P_{2}(f,x)\]

19.6. Why Intuitionistic Logic is good (ABANDONED) 🔗

Construct a minimal example where we have a set of axioms \(A\), from which we can proof a theorem \(T\) using LEM, such that we can add a axiom (that doesn't make the axioms inconsistent) to \(A\) such that \(T\) is no longer a theorem.

Axioms: \(A \rightarrow \bot\)
Theorem: \(\neg A\)
Proof: Assume \(A\) then \(\bot\), therefore \(\neg A\).

Axioms: \(A \rightarrow B,B\)
Theorem: \(A\)
Proof:

19.6.1. LEM example 🔗

\(\neg P \rightarrow Q \Leftrightarrow P \vee Q\)

By LEM we have \(P \vee \neg P\). If \(P\) then trivial. If \(\neg P\) then LHS true if \(Q\), and RHS also true if \(Q\).

19.6.2. Proof by contradiction 🔗

\[A \Rightarrow B \Rightarrow \neg B \Rightarrow \neg A\] \[(\neg B \Rightarrow \neg A) \Rightarrow (A \Rightarrow B)\] Therefore by showing \(\neg B \Rightarrow \neg A\) can be used that \(A \Rightarrow B\). ~~This is valid in intuitionistic logic. This is called proof by contradiction confusingly also. It does not use LEM.~~

LEM: \((\neg B \Rightarrow \neg A) \Rightarrow (A \Rightarrow B)\)

Axiom 1: \((\neg B \Rightarrow \neg A)\)

classical logic can derive \((A \Rightarrow B)\)

Can adding an axiom 2 \(\neg(A \Rightarrow B)\) give us a contradiction in intuitionistic logic? \[\begin{aligned} \ Ax.2\ \\ \neg(A \Rightarrow B) & \Leftrightarrow \neg(\neg A \vee B) \\ & \Leftrightarrow (\neg\neg A \vee \neg B) \end{aligned}\]

\[\begin{array}{r} \ Ax.1\ \\ (\neg B \Rightarrow \neg A) \Leftrightarrow \neg\neg B \vee \neg A \end{array}\]

\[\begin{array}{r} \text{ Rain } \Rightarrow \text{ I am wet} \\ \text{I am not wet } \Rightarrow \text{ It is not raining. } \end{array}\]

AX1 \[\left( \text{I am not wet } \Rightarrow \text{ It is not raining.} \right)\]

Infer with LEM: \[\begin{array}{r} \left( \text{I am not wet } \Rightarrow \text{ It is not raining.} \right) \Rightarrow \\ \left( \text{Rain } \Rightarrow \text{ I am wet} \right) \end{array}\]

Now modify AX1 to: \[\left( \text{I am not wet } \Rightarrow \text{ It is not raining. } \vee \text{ I have an umbrella} \right)\] Now \(\left( \text{Rain } \Rightarrow \text{ I am wet} \right)\) is false!

On the other hand if we start with with the single axiom AX1: \[\left( \text{Rain } \Rightarrow \text{ I am wet} \right)\]

Then both intuitionistic logic and classical logic can make the following inference: \[\begin{aligned} & \left( \text{Rain } \Rightarrow \text{ Wet} \right) \\ \Rightarrow & \left( \neg\text{ Wet } \Rightarrow \neg\text{ Rain} \right) \end{aligned}\]

If we modify AX1 to \[\left( \text{Rain } \vee \text{ Water-gun } \Rightarrow \text{ Wet} \right)\]

In classical logic if the laws of physics makes it such that when I am not wet, it's never raining, then I can infer that rain makes me wet.

Then my friend buys a water gun.

\(\neg\text{ Raining } \rightarrow \left( \text{Raining } \vee \text{ Umbrella} \right)\)

\(\text{Raining } \rightarrow \text{ Wet}\)

\(\left\lbrack \neg\text{ Raining } \rightarrow \left( \text{Raining } \vee \text{ Umbrella} \right) \right\rbrack \rightarrow \left( \text{Raining } \vee \text{ Umbrella} \right)\)

19.7. Forming Hypothesis is Zendo 🔗

19.7.1. Classes of Theorems 🔗

Structural Properties (FSA Property Deduction): Given that I know an FSA has some property, I can have a theorem that tells me that then another theorem must hold.
- A FSA with one state always returns the same value.
Predictive Properties (FSA structure neccesitates observations): A FSA with property \(P_{1}\) will produce outputs satisfying a property \(P_{2}\).
- If currently there are currently self loops, then giving any inputstring solely consisting of inputs that use these self loops, then an input string like this of any length will always return the same value.
Observational Property: Given some observations we can deduce something about our FSA.

In theory we can also have the following:

Given a finite set of observed input output behaviors, return the set of all compatible FSA.
Given a set of observed input output behaviors, return a property that must hold for this FSA.

However, if we don't constrain the hypothesis space (can be any FSA), then we can never have a theorems of this form. The first one would return a set that is infinitely large, because there are infinitely many FSA that match any observed behavior.

For any finite string of observations, there is always a FSA that has all these states hardcoded, and then does something completely different.

In the second case, it seems that we can't say useful properties that are definitely true, because there is always the FSA that is just hardcoding everything, and that FSA is not extrapolatable.

19.7.2. Importance of structural properties 🔗

Given that you know how an FSA works, and that you made a certain number of observations long before you observed all possible input-output pairs, if you know that the FSA only has 4 states, you will probably be able to say that now it must be 100% certainty be exactly this finite state automata. And you can say that because of the structure of FSAs together with the observations that you have to infer new things about the FSA deductively.

19.7.3. Infering predictive properties 🔗

Generating predictive properties.

(empirical) Use the axioms to run the FSA. And notice patterns. Describe these patterns with logic.
(proof) Use the axioms of FSA and infer theorems about FSA.

Proving seems intractable. Maybe proof search is very bad because people haven't figured out how to coherently incorporate into the algorithm the creation and integration of relevant examples to guide the search.

Major problem: How do you notice and extract patterns from the observation we generated by running the FSA based no the formal FSA model.

This entire problem seems recursive. We want to make a world model generator. For that we want to generate hypotheses. Now far generating the hypothesis, we want to again generate hypotheses, about what predictive properties might hold.

Also this problem also shows up when empirically generating evidence. We want to hypothesise what FSAs would be good to run such that they give us more information about patterns.

Generate Hypothesis about:

for generating predictive properties	when playing zendo
What patterns are good	What are observations to make to decrease our uncertainty
What FSA are good to simulate	What is the FSA in the Zendo game

The left hand side are the hypotheses that we want to generate here to generate the predicitve properties, while the one on the RHS are the ones that we want to generate when actually playing zendo.

So the problem is recursive in that we now to solve the original problem, we need to solve a problem very similar to the original problem. However this problem is simpler, wich might be key. Potentially we could construct a recursive algorithm which iteratively splits the problem until it becomes trivial.

19.7.4. Recursive steps 🔗

Find the FSA \(P\) that plays Zendo
- Generate a new FSA \(S\) according to my understand of FSAs to get more evidence
  - Hypothesise what FSA would be good to generate.
  - …
- Run \(S\) to get input-output pairs.
  - Find patterns in the input output pairs, i.e. search for logical senences that are all true of the positive outputs, and the negative outputs.
    - Form hypothesis of what is a pattern in the input output pair.
      - Find sentences that are true of all true outputs, and all false outputs.
        
        Form hypoethis about which logical sentences are true.

Idea: take the FSA that generated the observations so far and reduce it as much as possible until only the parts that were actually used for generating the structure remain.

We want to find a statement like: Let \(f\) be an FSA. Let \(P_{1}\), \(P_{2}\) Has property \(P_{1}\) then the input-output behavior of \(A\) has property \(P_{2}\).

Recursive-2 property inference:

Let \(P\) be a random property on FSAs.
Let \(S\) be a randomly sampled set of FSAs satisfying \(P\) (dumbest thing: rejection sampling)
Let \(G\) be randomly sampled input-output observations, for each FSA in \(S\)
Let \(P_{2}\) be a property true of the input-outputs.
- so, \(\forall(i,o) \in G,P_{2}(i,o)\)
- (if this is too difficult we can just randomly remove inputs/output pairs. though this would make the \(H = S\) stated later false).
Let \(H\) be a subset of the set of FSAs for which \(P_{2}\) describes the input output behavior correctly for all possible inputs.
- If finding an \(H\) is too hard, simply \(P_{2}\), e.g. simply randomly remove stuff from \(P_{2}\) (ideally be smarter about it).
- In the simplest case \(H = S\)
Finally let \(P_{1}\), be a property that selects a subset of \(H\) from the set of all FSAs.

(Seems related to property based testing simplifications.)

Note that the above is better than exhaustive program search, because we are guaranteed that there exists a set \(H\) corresponding to \(P_{2}\).

Alternatively generate easy instances of the original problem: we could also try to randomly sample a property on input output pairs and then try to sample. If we pick a simple property on the inputs and outputs, we can generate instances of the original problems that are very easy (or rather the right difficulty for learning).

Note that this could be used with the above recursive-2 property inference (again recursively).

19.7.5. Generating a property of input/outputs pairs 🔗

Write down the input output pairs as logical sentences.
Simplify, and add quantification.

\[\begin{array}{r} 000 \Rightarrow 1 \\ 001 \Rightarrow 1 \end{array}\] \[\begin{aligned} \neg I_{0} \land \neg I_{1} \land \neg I_{2} & \rightarrow \text{ output } = 1 \\ \neg I_{0} \land \neg I_{1} \land I_{2} & \rightarrow \text{ output } = 1 \end{aligned}\]

This can be simplified to \[\begin{aligned} \neg I_{0} \land \neg I_{1} & \rightarrow \text{ output } = 1 \end{aligned}\] and potentially even (makes more sense with more zeros): \[\begin{aligned} \left( \forall i \in \left\{ 1,2 \right\},\neg I_{i} \right) & \rightarrow \text{ output } = 1 \end{aligned}\]

Scrap: \[\begin{aligned} \left( \neg I_{0} \land \neg I_{1} \land \neg I_{2} \right) \vee \left( \neg I_{0} \land \neg I_{1} \land I_{2} \right) & \rightarrow \text{ output } = 1 \\ \Leftrightarrow \\ \left( \neg I_{0} \vee \neg I_{0} \right) \land \left( \neg I_{1} \vee \neg I_{0} \right) \land \left( I_{2} \vee \neg I_{0} \right) \vee \\ \left( \neg I_{0} \vee \neg I_{1} \right) \land \left( \neg I_{1} \vee \neg I_{1} \right) \land \left( I_{2} \vee \neg I_{1} \right) \vee \\ \left( \neg I_{0} \vee \neg I_{2} \right) \land \left( \neg I_{1} \vee \neg I_{2} \right) \land \left( I_{2} \vee \neg I_{2} \right) & \rightarrow \text{ output } = 1 \\ \Leftrightarrow \\ \left( \neg I_{0} \right) \land \left( \neg I_{1} \vee \neg I_{0} \right) \land \left( I_{2} \vee \neg I_{0} \right) \vee \\ \left( \neg I_{0} \vee \neg I_{1} \right) \land \left( \neg I_{1} \right) \land \left( I_{2} \vee \neg I_{1} \right) \vee \\ \left( \neg I_{0} \vee \neg I_{2} \right) \land \left( \neg I_{1} \vee \neg I_{2} \right) & \rightarrow \text{ output } = 1 \end{aligned}\]

19.8. Next Steps 🔗

Distill the content such that we can evaluate where the bottlenecks are.
- Write down the algorithm in a nicer format, with clear step by step instructions.
Currently the bottleneck seems to be: How to calculate a property that is true of all elements of a set (e.g. a set of bitstrings).
- related: What are existing logic simplification techniques, and ways to add quantifiers?
How to use a predictive property once we have it?
Implement prototype algorithm

19.9. Reflection 🔗

Good conceptual progress.
We thought only at a very very high level of abstraction, writing down very very rough algorithm descriptions. This was good because it allowed us to move very fast, and identify bottlenecks, and get less confused.
- It would have been a mistake to focused on writing down any part more cleanly.
As a general strategy you want to focus on the most confusing part, or the part you understand least how to do, instead of developing the things you already understand. However developing the things we already understand is the natural easy thing to do.

20. 2024-08-24 - Thomas - 🔗

20.1. Investigate algorithms by checking compression ability 🔗

Ideas

Compression

Let \(F\) and \(G\) be compression algorithms. Let \(G\) be some other compression algorithm (e.g. gzip). Let \(D\) be some data. What does \(\mathcal{C}(D) ≔ \frac{F(D)}{G(D)}\) tell you.

E.g. \(F\) is a an algorithm that finds the perfect FSA that generates the data. \(G\) is gzip.

If \(\mathcal{C(D)}\) is small then it means that FSAs are very good at representing \(D\), while the other compression algorithm is worse. It semes this could tell us esomething important about FSAs. Namely what things are easy/natural to represent for FSAs.

So finding \(D\) such that \(\mathcal{C}\) is small, tells what FSAs to study, namely the nose generated by \(F\).

The choice of \(G\) and it's realition to \(F\) is very unclear.

Intuitively we want to capture the notion that we are suprised by how good \(F\) is at compressing something. For that we can also compare the description length of \(F\)‘s FSA and the description generated by \(G\) together with the unzip algorithm.

If we find a \(D_{2}\) such that \(\mathcal{C}(D_{2})\) is small then it tells us that \(G\) is too good. It means that there is no FSA that can encode \(D_{2}\) well.

20.2. FSA based compression 🔗

Let \(A\) and \(B\) be finite sets. Let \(f\) be a function from \(A\) to \(B\).

Theorem: There exist a FSA with the same functional behavior as \(f\).

Proof: Because the function has only finitely many behaviors, the FSA can simply be a tree, selecting some exact output state. \(\blacksquare\)

Every FSA is isomorphic to some finite lookup table.

Let \(D\) be some data, consisting of a set of 2-tuples, where the first element in the tuple is a finite bitstring, and the second is a single bit. We can now construct the hardcoding FSA \(H\).

Now iteritively simplify that FSA without changing it's behavior to to get something which can generate \(D\).

20.3. FSA Backtracing 🔗

Given some concrete FSA, we can find an input that produces a particular output, by first finding a state which has the desired output state, and then search backward with BFS, until we find the starting state, keeping track of all edges we use, which corresponds to the sequence of inputs we need to give.

More formally, let \(f\) be an FSA \(\left( \Sigma,\Gamma,S,s_{0},\delta,\omega \right)\) with \(\Sigma ≔ {\mathbb{B}},\Gamma ≔ {\mathbb{B}}\).

We can find an input \(i \in \Sigma^{\ast}\) with \(f(i) = \gamma\) where \(\gamma \in \Gamma\), by selecting an arbitrary state \(s \in S\) with \(\omega(s) = \gamma\), and performing BFS in the reversed graph describing \(f\)‘s state transitions, until we reach \(s_{0}\).

20.4. FSA output pair generation 🔗

Given a FSA \(f\) we can generate all possible input output pairs by performing BFS, and saving the input output behavior as we move along. This avoids redoing compuations.

E.g. given the input string \(a ≔ 101\), \(b ≔ 1010\), computing the input output pairs with \(\left( \left( a,f(a) \right),\left( b,f(b) \right) \right)\), would result in us causing mostly the same state transitions to occur in the FSA.

Generally with BFS we can deduplicate computations for bitstrings that share an initial similar segment. Though it is actually even better than this, because multiple inputs can take the same transition path.

20.5. NFA can model multiple DFA 🔗

[

], We can represent all DFA up to some number of states \(n \in {\mathbb{N}}\) with a NFA.

To model an DFA \(f\) that has \(n\) states we need a NFA with \(2n\) states where half these states are accepting, and half are rejecting, because we don't know how many states of \(f\) are accepting. Now though by selecting particular transitions, we can get anything between “all states accepting” to “all states rejecting”.

The starting node is a special case. We need 2 start states, one which is accepting, and one which is rejecting.

20.6. Playing Zendo with NFA 1 🔗

Create large NFA \(N\) representing all DFA up to size \(n\).
Use FSA backtracking from (??, a) to eliminate the most candidates.
Alternatively find the state that has the most non-deterministic edges, where half of the these edges for the same input move to output 1, and half to output 0.

Get observations, and update N by removing edges that are not compatible with the observations received:

20.7. Playing Zendo with NFA 2 🔗

As you get in observations iteratively construct an NFA \(N\). For each new observation, add all nodes and vertices which would produce the concrete result, we have observed.

Then run \(N\) on the newest observation. Now select all paths, through the FSA which produce a wrong result. For each path, delete the last edge in the path in \(N\).

Note that if we get observations of the form \(\lbrack\rbrack\) \(\lbrack 0\rbrack\), \(\lbrack 1\rbrack\), \(\lbrack 10\rbrack\), etc. then always the last edge is the wrong one, because taking one less step must have produced a correct result, otherwise the edge would have been pruned in the previous iteration.

If we don't iterate like this, we could search over the string to find the shortest.

Note that with this method we can throw away our observations without losing information because our NFA is in a sense a perfect model of the observations we have received so far (but new observations can invalidate parts of the model). This is a general desiderata we want for modeling procedures.

20.8. FSA limitations 🔗

FSAs can't compute functions where the size of the state you need to keep track of is proportional to the size of the input. This is because the states are hardcoded in the FSA architecture itself. There is no mechanism to dynamically create more state, e.g. by pushing something onto a stack.

Example of a set of strings that FSA cannot describe: \(\left\{ 0^{n}1^{n}|n \geq 0 \right\}\) (bitstring of length \(2n\) where the first half is 0 and the second half is 1).

20.9. Zendo Model creation 🔗

As previously discussed, we could

Instead we could generate a set of FSAs that have the same input output behavior and then try to generate a property they share.

Another approach could be to generate an FSA \(f\) and have it generate some input-output pairs \(o\). Then we want to find a property \(P\) of the from: \[P(f) ≔ \left. \left( P_{1}(f) \Rightarrow \forall x \in {\mathbb{B}}^{\ast},P_{2}\left( f(x) \right) \right) \right..\]

In the simplest case \(P_{1}\) could be “its exactly the FSA \(f\)”, and \(P_{2}\) could be the property encoding exactly the observed behavior with \(o\).

Now we want to generalize \(P_{1}\). What are the properties that are actually needed to make \(P_{2}\) true?

20.10. Logic Thing 🔗

How do you get the predictive properties?
- Sample a bunch of FSA.
- Generate a bunch of input output behavior.
- Represent the fsas and the inputs/outputs as a single logical formula.
- Simplify the formula
  - simply using logic rules
  - Simplify by cutting stuff, e.g. ignoring a particular state in the FSA, and at the same time ignoring the input/ouput pairs effected by this simplification.
- Simple formulas on FSA describing complex input output behavior are good.
How do you use them?
- Given an observation (input-output pair) find the predictive properties that are compatible with the observation, and compatible with what we know about FSA so far.
  - It feels like we could just use automatic resolution.
- Try to falsify the hypothesis.

The idea is that we want to find properties which tell us small bits about the FSA we are trying ot infer. The predictive properties should faccilitate this. Contrary to Solomonov reduction, we don't want to try to just guess the entire FSA all at once. Instead, we want to have the predictive properties that allow us to correctly infer that the FSA must have certain properties, such that we can infer more and more of these properties, building a more and more coherent picture of what the actual FSA must be like. Each property constrains what the FSA can be.

A predictive property \(P\) is of the form \[P(f) ≔ P_{1}(f) \Rightarrow \left\lbrack \forall x \in {\mathbb{B}}^{\ast},P_{2}\left( x,f(x) \right) \right\rbrack\] where \(f\) is an FSA.

(Again note that if we figure out the overall algorithm we could use it to solve the specific problem here (which is part of the algorithm) of generating hypothesis about what is the best experiment to perform to get more evidence whether a predictive property does in fact always hold.)

20.11. My understanding 🔗

we have a long list of tuples \(\left( P_{1},P_{2} \right)\) where \(P_{1}\) is a predicate for FSAs, and \(P_{2}\) is a predicate for input-output pairs, and we have \(P(f) ≔ P_{1}(f) \Rightarrow \left\lbrack \forall x \in {\mathbb{B}}^{\ast},P_{2}\left( x,f(x) \right) \right\rbrack\)
- example of a \(P_{2}\) predicate: \(P_{2}(i,o) =\) “1 if \(i\) ends with ‘00'”
we can filter that list for tuples where \(P_{2}\) holds for what we actually observed
this gives us a list of predicates, \(P_{1}\), which could hold for the FSA we're looking for
- we use these as hints
then we try to falsify the \(P_{1}\) predicates
- do we know how to do this?

20.12. Lower bund on number of FSA states based on cycle Lengths 🔗

Let \(f\) be a DFA. If we input a string of just zeros then output of \(f\) must repeat, with a cycle \(c \in {\mathbb{N}}\) that is smaller or equal to the number of states of \(f\). This gives you a lower bound.

E.g. if the behavior is

Input	Output
0	1
1	0

We know that \(f\) must have at least 2 states.

Input	Output
1	0
11	0
111	1
1111	0
11111	0
111111	1

This means \(f\) has at least 3 states. Note that the output could be a lot more random looking. Then it's useful to check for the cycle:

Input	Output
1	1
11	0
111	1
1111	0
11111	0
111111	1
1111111	0
11111111	1
111111111	0
1111111111	0
11111111111	1

Here the cycle is \(10100\).

We can do the same with input strings of only zeros.

Note that this is an observation property, meaning we can infer what our FSA is, based on some observed input-output behaviour.

20.13. Approaches so far 🔗

20.13.1. Embedding an FSA 🔗

As written in the distillation phase 1 log, we can interpolate between input bitstring to observe when things start to break, i.e. compute the boundary.

Maybe the same can be done with logical sentences or other datastructures that could represent properties.

20.13.2. Julias algorithm 🔗

Do Solmonoff induction approximation together with generating desired observations, based on what would eliminate the most probability mass.

20.13.3. Predictive properties 🔗

Investigate and automatically learn things about FSAs by generating predictive properties, which can be used to form hypothesis which can then be tested.

20.14. My (Thomas') thoughts when trying to solve FSA Zendo 🔗

how to play FSA Zendo as an AI
- you (the AI) should already have some idea what programs are
  - you have a native representation of computation somehow
- before you (the AI) start, look at the definition of FSA and compare it to your native computation to see where FSA is limited
  - maybe think of some simple rules for the ways in which FSAs are more limited than universal computation
- you look at the observations so far and think of the simplest hypothesis (in your native computation representation) and check whether FSA can actually represent this
- as you always do with hypotheses, try to find conflicting and supporting evidence for it
  - your native computation should be able to do something like “find a short input which makes this function return false”
    - this would ironically be incredibly easy with an FSA, because you can just do backwards search
      - but you don't know this necessarily
  - in general, your hypotheses are probably stored in a way where it's efficient to generate possible evidence (for and against it)
    - this seems very desirable
is “finite-state-automata AI” an interesting concept?
- an AI which can natively only use FSA computation and cannot do anything Turing-complete
- that is, the AI itself could be written in a Turing-complete language, but in its “thoughts” it can only consider FSA hypotheses: hypotheses that can be expressed with an FSA

20.15. Manual Predictive Property Generation 🔗

Let \(f\) be an FSA that takes in bitstrings, with one state, that is always rejecting.

Example input output behavior \(o\) (let this be a dictionary):

Input	Output
`[]`	0
11	0
01	0
1001	0
11101	0

Dumb predictive property: \[\begin{aligned} P\left( \text{fsa},\text{ input} \right) ≔ & \text{fsa } = f \\ & \Rightarrow \left. \begin{aligned} (\text{input } \in o.\text{keys } \\ & \Rightarrow f(x) = o.\lbrack x\rbrack) \end{aligned} \right. \end{aligned}\]

We can spilt this further into: \[\begin{aligned} P_{1}\left( \text{fsa} \right) & ≔ \text{ fsa } = f \\ P_{2}\left( \text{input} \right) & ≔ \text{ input } \in o.\text{keys} \Rightarrow f\left( \text{input} \right) = o.\left\lbrack \text{input} \right\rbrack \\ P & ≔ \forall f \in \text{ FSA},P_{1}(f) \Rightarrow \forall y \in {\mathbb{B}}^{\ast},P_{2}(y) \end{aligned}\]

We could Split this even further. \[\begin{aligned} P_{\left( \text{FSA} \right)\left( \text{fsa} \right)} ≔ & \text{fsa } = f \\ P_{\left( \text{input} \right)\left( \text{input} \right)} ≔ & \text{input } \in o.\text{keys } \\ P_{\left( \text{conclusion} \right)\left( \text{input} \right)} ≔ & f\left( \text{input} \right) = o.\left\lbrack \text{input} \right\rbrack \\ P ≔ & \forall f \in \text{ FSA},\forall y \in {\mathbb{B}}^{\ast}, \\ & P_{\left( \text{FSA} \right)(f)} \land \left\lbrack P_{\text{input }}(y) \Rightarrow P_{\text{conclusion }}(y) \right\rbrack \end{aligned}\]

Now if we see that \(P_{\text{input}}\) and \(P_{\text{conclusion}}\) are true for the observed input output behavior, we know that \(P_{\text{FSA}}\) could be the cause of this.

20.16. Generating, Composing,Predicitve Properties: How to generate, compose, and use to form hypethesis 🔗

20.16.1. Reducing the Predictive Property 🔗

Now, if we can manage to simplify \(P_{\text{FSA}}\) we only win. Currently we are checking if the input FSA is equal to \(f\). However, instead we could try to find states in \(f\) that don't have any impact.

One example of such states would be the ones that where not visited while generating the example outputs. These can just be ignored.

There are also algorithms for rewriting FSAs into a normal form.

A more general method that would work with a wide range of datastructures is to use the concepts of reducers from property based testing. We try to reduce the datastructure that describes \(f\), e.g. by removing a state. Then we check if the reduction still generates the same input output behavior, that we elicited previously.

In general a reducers take a datastructure, and generates a new data structure such that some data value in the data structure gets updated towards the base case. For example, if we have a natural number, we update it towards 0. If we have a list, we remove the last element. Such that we are guaranteed to make the thing simpler. And at each step for each new data structure, we check whether the property still holds. Meaning at each reduction step, we only need to check once.

The reduction step is related, to a simplicity prior “Ah I observe this thing, so probably this simple thing (because reduction) is doing that.”

20.16.2. Compositional FSAs 🔗

The simplest way in which we could make use of these reduced predictive properties is by interpreting them as describing components. We don't say that the FSA we try to infer is maybe exactly the same as \(f\) but that the FSA, but instead that \(f\) is maybe constituting a subgraph in the FSA we are trying to infer.

The prime numbers desiderata. The predictive properties that we store should have the property that any property can't be constructed in few steps out of other properties.

In the simplest version of this we would could say that the FSAs associated with each rule should be such that none of them can be constructed by combining FSAs of other predictive properties in our database.

Also: focus on those predictive properties that are often useful to construct other predictive properties – otherwise you could end up with a lot of properties that are very unique but actually not often helpful.

In summary, you want the <predictive properties/FSAs> that allow you to construct the most new predictive properties, while not containing redundancy, for some fixed storage size you allocate for storing this knowledge.

This overall setup seems very general, and would also work if we where not trying to infer FSAs.

20.16.3. Composing the correct FSA 🔗

We are playing the game. Assume we have some good predictive properties.

We now iteratively generate a bistring \(S\) of length \(n\). In each step we append a randomly sampled bit. Each time before we add a bit, we run the string we got so far on the FSA we are playing against.

As we grow our random bit string, we feed this bit string also to all of our predictive properties, such that we can see what FSAs we know would produce the input output behavior we saw so far. We then continue this procedure until we have eliminated all but one automata that would produce this bit string. We mark down the bitstring as a segment that has the automata we know generates this bitstring associated with it.

Then we start a new segment, and repeat this procedure.

We end up splitting our input into segments and associating each segment with a specific FSA.

After some number of steps, we then combine all these automata into one, and reduce it into normal form, to get our final hypothesis.

It feels like maybe we could detect intelligently when to stop.

It feels like this setup could be improved.

This procedure allows you to construct a hypothesis about the game FSA using only knowledge about very small FSAs.

Question:

Could we just generate the set of predictive properties that describe small FSAs (e.g. fewer than 4 states)? Would this be sufficient? How far can you go with this?
Can you hierarchically combine your predictive properties to generate more predictive properties in advance that are more efficient (or at least have much lower description length.)

20.17. FSA and World State Graph 🔗

It seems like a World State Graph is an “infinite FSA”: the states of the automaton are the world states and the transitions correspond to the actions.

FSA	World state graph
State	State
Transition	Action
Alphabet	Set of actions
Initial state	Starting state
Accepting state	Target state
Input sequence	Plan

20.18. Summary 🔗

Some major thing we did (not exhaustive):

We tried to understand how to generate predictive properties
- We thought about how to write observed behavior down in logic and then simplify, to get a good predictice property.
- We understood how NFAs can moddel DFAs and how we could iteratively construct an NFA representing, our hypothesis, based on the observations.
Simple predictive properties
- We figured out that we can have a predictive property that just describes exactly the observed input output behavior.
- We can then interpret the property as specifying just some part of the structure (e.g. the predictive property FSA is a subpart of the actual FSA we try to infer).
- We understood how to compose many small such predictive properties to form an overall hypothesis, by matching FSAs to single long observed bitstring.

20.19. Next Steps 🔗

Implement the FSA composing algorithm described in (??, a).
- Write down a structured description of the algorithm in math.
- Implement MVP in Clojure, which would involves generating predictive properties based on all FSAs of size \(n\) or smaller.

20.20. Reflection 🔗

I have no idea what I am doing. But then somehow I do good things (things seeming good in retrospect). Specifically I did not know that I could get the algorithm in (??, a). But then we did because we tried.
FSAs so far seem to be the right choice because they are easy to reason about, write about formall, and code up.
Sometimes we talked very abstractly until finally we did a concrete example which was then very useful.

20.21. The Core Idea 🔗

We're trying to create an algorithm that, given some rudimentary knowledge about the system, for example an FSA, uses that knowledge to study the system such that it learns to infer what the concrete system is that I'm looking at, given some input-output behavior that I'm observing.

This is step one, where the system learns based on foundational knowledge. Step two is actually playing the game. But while you play the game and you get new information about the concrete finite automata that you're trying to infer, for example, you can use a similar reasoning process to know more about the concrete specific automata you're trying to find.

20.22. Misc 🔗

21. 2024-08-25 - IA 🔗

21.1. Predictive Property 🔗

A predictive property is a logic statement of the form: \[\begin{array}{r} \forall f \in \text{ WorldModel},\forall o \in \left\{ \left( x,\text{ observe}\left( f(x) \right) \right)~|~x \in \text{ WorldState} \right\}, \\ P_{1}(f) \Rightarrow P_{2}(o) \end{array}\] where WorldModel is the set of functions from \(\text{WorldState } \rightarrow \text{ WorldState}\).

Alternatively we could have WorldModel be \(\text{WorldState } \rightarrow \text{ Action } \rightarrow \text{ WorldState}\) :todo: \[\begin{array}{r} \forall f \in \text{ WorldModel},\forall o \in \left\{ \left( \text{List (Action × Observation)},\text{ observe}\left( f(x) \right) \right)~|~x \in \text{ WorldState} \right\}, \\ P_{1}(f) \Rightarrow P_{2}(o) \end{array}\] In Lean:

-- We need this import to use setbuilder notation
import Mathlib.Data.Set.Basic

-- We untroduce the neccesary types
opaque WorldState : Type
opaque Observation : Type
opaque observe : WorldState → Observation := sorry
def WorldModel := WorldState → WorldState

-- Given a world model, the stateObservations, is the set of all possible
-- input output behaviors.
def stateObservations (f : WorldModel) : Set (WorldState × Observation) :=
  { (w, observe (f w)) | w : WorldState }

-- A predictive property asserts something about the input output behavor of a
-- world model, given that the world model satisfies some other property.
def predictiveProperty (P₁ : WorldModel → Prop) (P₂ : (a : WorldModel) → stateObservations a → Prop) : Prop:=
  ∀ f : WorldModel, ∀ o : stateObservations f, (P₁ f) → (P₂ f o)

A predictive property says that if a property \(P_{1}\) holds for a world model, then the world model generates observations for which \(P_{2}\) holds.

We want to use predictive properties to form hypothesis about the real world, Using our world model. Note that the WorldModel type is the type of all possible world models we could consider, and not any particular world model.

The goal is to figure out which concrete world model from WorldModel describes reality.

To make things more concrete let's consider that our world model is an FSA. \[\begin{array}{r} \forall f \in \text{ FSA},\forall o \in \left\{ \left( b,f(b) \right)~|~b \in f.\Sigma^{\ast} \right\}, \\ P_{1}(f) \Rightarrow P_{2}(o) \end{array}\]

A predictive property says that if a property \(o\)

Where \(f\) is a FSA, and \(o\) is a subset of \(\left\{ \left( b,f(b) \right)~|~b \in f.\Sigma^{\ast} \right\}\).

Predictive properties can be used to generate hypothesis. Given that we observe some When playing bit-FSA Zendo,

Generate the set of prodictive properties by

22. 2024-08-26 - Maren 🔗

22.1. What makes a Game be about learning: A collection of increasingly complex learning setups 🔗

Each game of Zendo is about learning the generating rule. Counterstrike is in large part about aiming at the enemies head. Infering a rule and aiming seem qualitatively different.

Let's first consider the task of computing \(3 + 5\). There is only one answer: \(8\). And this answer takes the form of a pice of data. Now consider the task of adding two numbers in general.

:todo:

How is this different from having perfect aim in a shooter?

In a limited sense you need to “learn” how to shoot somebody in each sitution.

In both situations, there is some problem \(P\) to solve: <\(p_{1}\): Infering the rule/\(p_{2}\): aimingat animies head>. Initially we can't solve \(P\). What does the the best algorithm \(a^{\ast}\) for solving \(P\) look like.

\(p_{3} ≔ \text{ compute }5 + 3 = ?\)

\(p_{3}\)‘s solution is \(8\). A constant value. But \(p_{2}\)‘s solution can't be a constantant, because there are too many situations where you need to shoot somebody. You need algorithm to calculate how to aim given the current sitution.

In the simplest case you know where you are, where you are looking, where the enemies head is and then use linear algebra.

(also bullet dropoff)

For \(p_{3}\) knowing the solution is about remembering a concrete result. For \(p_{2}\) the solution is remembering how to do something.

For \(p_{1}\) we also need to find an algorithm that find a solution. However the computational steps that are used to play Zendo are different for each game. If you get different observations you form different hypothesis. Computing how to test an hypothesis is different based on the concrete hypothesis. Therefore to solve the problem we need to compute the best strategy, meaning while solving the problem we create new algorithms, whereas in \(p_{2}\) we didn't need to create a new algorithm while trynig to aiming.

Again this is because you can't precompute for every possible hypothesis how to evaluate it.

22.1.1. Making Rules 🔗

Let consider another problem \(p_{4}\), which is playing Baba is You. As part of the game you manipulate the rules of the game. This means it's not sufficient to have an algorithm wich infers the rules of the world. Additionally you need an algorithm which computes how to modify the rules of the world to achive your goal.

This is actually a very important problem. Programming as all about creating a world with the right rules.

22.1.2. Companies 🔗

There is another problem \(p_{5}\) which is making policy. You want to pass laws to make companies behave well. The difference here is that your rules sort of need to be perfect. Presumably you don't care that much that any particular company gets filthy rich, but that's approximately their goal. If there is a loophole in the law then the companies will exploit it.

They might know that cirarettes are cancer inducing, and that people don't want to have cancer, and but still sell them, because doing that is legal. Even while knowing that eventually people will figure it out and will pass laws preventing this. And the rest is history.

It's useful to consider the limit case where the company is infinitely smart. If your rules still have the desired effect, then your rules are good.

This would be the static setup. However similar to Baba is You we can consider the case where the right ruleset depends on the current context. Specifically we can imagine the situation where no best rule exists, because what the best strategy is depends on what stretegy your opponent is using. And now we get into geame theoretic teritory (e.g. prisoners dilema).

22.1.3. Contrived Game 🔗

Let \(R\) be a set of rules.

When the game starts randomly sample an \(r \in R\).

22.1.3.1. Concrete 🔗

To make it more concrete, let there be a rule deck of 64 cards \(R\), each containing a rule. To play the game radomly draw 6 cards. These cards combined form the rules of the game, such that there are now 74974368 possible rulesets.

The rules are hidden from the players.

Example Rule: You can trade arbitrarly with other players, exchanging play stones.

Let there be an infinite supply of 4 playing stones, red, green, blue, and purple.

Let there be also a goal deck of

Example Card: Every blue stone gives you 1 point.

A player can always ask if getting loosing a number of play stones (all happening in one transaction) increased or decreased their score.

There are too many “possible worlds” such that knowing \(R\) isn't enough to

22.1.3.2. You need to understand how to play while you play 🔗

This game demonstrates that because there are so many possibilities of what the actual game could be, that you cannot pre-compute the optimal policy for every possible rule set.

That means you are forced to compute the actual optimal policy based on concrete information you find out about the particular game you are playing while you are playing the game.

23. 2024-08-26 - Thomas 🔗

23.1. Iterating all \({\mathbb{N}}^{\ast}\) Vectors 🔗

How can we iterate all vectors of natural numbers, denoted as \({\mathbb{N}}^{\ast}\)? Let's start by considering how we can iterate \({\mathbb{N}}^{2}\):

00
01
11
10
02
12
22
20
21
03
13
23
33
30
31
32

The rule for iteration is most easily explained by a diagram:

We start in the top left corner, and then iterate through the elements in the green l-shaped pices. In each l-shaped pice we first iterate through the elements indicated with the cyan arrow, and then the ones with the purple arrow.

Describing this algorithms in words makes it easy to see how we can implement it. We hold the right digid fixed at \(x\). \(x = 1\) initially. Then generate all pairs for when the digit on the left is in \(\left\{ 0,\ldots,x \right\}\). Repeat this but now reverse the roles, fixing the left digit to be \(x\). Now generate the pairs for when the digit on the right is in \(\left\{ 0,\ldots,x - 1 \right\}\). Note the \(x - 1\). We don't need to generate the pair \((x,x)\) twice.

We can easily extend this procedure to iterate \({\mathbb{N}}^{3}\):

000
001
001
011
111
101
022
122
222
202
212

This method extends to any \({\mathbb{N}}^{n}\) with \(n \in {\mathbb{N}}\).

We can combine all these different interation procedures into one, such that we end up iterating

0
1
00
01
11
10
2
02
12
22
20
21
000
001
001
011
111
101
3 ; (list of length 1 up to 3)
X ; here instert lists of length 2 up to 3, list of length 3 up to 2, lists of length 4 up to 1
4
5

Or more readably:

list of length 1 containing elements from \(\left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1 \right\}\)
list of length 2 containing elements from \(\left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3 \right\}\)
list of length 2 containing elements from \(\left\{ 0,1 \right\}\)
list of length 3 containing elements from \(\left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3,4 \right\}\)
list of length 2 containing elements from \(\left\{ 0,1,2 \right\}\)
list of length 3 containing elements from \(\left\{ 0,1 \right\}\)
list of length 4 containing elements from \(\left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3,4,5 \right\}\)
list of length 2 containing elements from \(\left\{ 0,1,2,3 \right\}\)
list of length 3 containing elements from \(\left\{ 0,1,2 \right\}\)
list of length 4 containing elements from \(\left\{ 0,1 \right\}\)
list of length 5 containing elements from \(\left\{ 0 \right\}\)
…

Note that we can change this arangement to bias the iteration. For example, we could only add the next lists of longer length every 4 steps:

list of length 1 containing elements from \(\left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1 \right\}\backslash\ list\ of\ length\ 1\ containing\ elements\ from\ \left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3 \right\}\backslash\ldots\)
list of length 1 containing elements from \(\left\{ 0,1,3,4 \right\}\)
list of length 2 containing elements from \(\left\{ 0 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3,4,5 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3,4,6 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3,4,7 \right\}\)
list of length 1 containing elements from \(\left\{ 0,1,3,4,8 \right\}\)
list of length 2 containing elements from \(\left\{ 0,1 \right\}\)
list of length 3 containing elements from \(\left\{ 0 \right\}\)
…

This would make it such that we “explore” more the dimension of having different elements in the list, and “exploring” less lists of longer lengths.

This overall approach should generalize to a vide range of datastructures. For example, it trivally generates to tensors, because any finite tensor can be transformed into a vector.

23.2. Solomnoff induction 🔗

Here is a sort of weird sine function which Solomonov induction would consider while a human wouldn't. \[f(x) ≔ \begin{cases} \sin(x)\text{ if }x \leq c_{1} \\ c_{2}\text{ otherwise} \end{cases}\] Note that you can change \(c_{1}\) without affecting program length that much, especially if you do stuff like \(H_{10}(5)\) where \(H_{10}\) is the 10th hyperoperation.

The last element of reduction gets more and more observations, it would converge to assigning all probability to the simple sine function.

23.2.1. Problems with solomonoff induction 🔗

So, the manner of induction cannot be approximated. If you randomly sample hypotheses, then you don't by default have a way to notice if your hypothesis is close to the correct one. You don't know how to update towards the correct one from one that is close to the correct one.
But to make world-modeling fast, we want a procedure that can iteratively get closer and closer to the correct hypothesis by only having a very limited number of hypotheses being considered at any point in time, and not every possible hypothesis that's even conceivable by a computational system like Solomonoff induction does.
We don't want to explicitly filter our hypothesis, we want to have a reasoning procedure that only ever considers the hypothesis that makes sense.

23.3. Reflection 🔗

Iterating over all \({\mathbb{N}}^{\ast}\) vectors seemed like a good thing to figure out.
We could have been better at setting explicitly targets we want to reach.
Clojure seems still good.

23.4. Next Steps 🔗

Benchmark how fast a Solomonoff inductor can find the FSA Zendo solution
- Number of observations
- For what number of states does it become intractable?
Check variations of the Solomonoff inductor
- Optimize to generate the next observation to eliminate maximum probability mass.
- Have the Solomonoff inductor run with the compute-boundary observations instead of just the first \(n\) observations.
  - How much better is this than explicitly computing how to explicitly eliminate the most probability mass. How much less compute do we need? How much performance do we loose (i.e. how many more observations we need on average)?
Implement the better algorithm of just looking at parts of the bitstring, using predictive properties to construct an overall hypothesis, from many small FSA parts that you do understand.

24. Julia 2024-08-27 🔗

24.1. Program search Doesn't work 🔗

So the mode of induction doesn't work because the space of the program seems to consider it too large.
- The strategy of cutting the probability mass in half by searching would actually be optimal for figuring out the correct world given the fewest observations, but the problem is we can't do this because there are too many programs.
- Approximating this procedure by just randomly sampling some programs also doesn't work, because only the correct program is the one that will be accepted. And there is no way for Solomonov induction to know whether it is close to a hypothesis or not. It can only just re-sample more random programs, but this is so inefficient that there are too many programs to actually have this procedure converge on the world state program.
- This is at least my intuition.

Humans do Something smarter, they don't just throw away all the hypotheses.

Julia: Picots have a genetic algorithm that samples new programs by combining existing ones.

Julias Topic: i think the topic i wanna try to think about tommorow is something like if I have a bunch of “”“optimization-power””” in a bucket and shake, out pops demons&superints. why and what are the mechanics of this?

example1: put physics in a universe and shake. boom optimizers (humans)
example2: put fitness selection pressures on a planet and shake. boom optimizers (humans?)
example3: put a bunch of meta-search in a computer. boom? optimization demons???

im happy to think about something else if you have topics you are excited about

Johannes:

What counts as optimization power here.

Shaking molecules creates configurations such that these are self replicating.

You can't evolve a salt cristal.

In a computer program, why do you need a function definition, such that you then say these instructions are for this function. If you didn't have that, you couldn't have a mechanism to refer and reuse the function. the function definition doesn't actually do anything in your program. You could rewrite your program such that you don't have it and just inline all the code but that sort of maybe makes your code so big that it's now impractical or something.

An RNA molecule needs to have the structure that holds all of the bases together.

But these auillary structures are not the bit that is elvolved upon (though making the RNA longer makes there be more structure).

There is an important point about that you want to have the RNA basic auxiliary structure not be changeable and only the base pairs. You want to have the RNA structure to be such that you can have continued good self-modification. That doesn't break you. If you change your auxiliary structure such that your molecule just breaks apart, then you are completely fucked. It is still possible to fuck yourself by getting the wrong base passage that the self-replication breaks, but this is sort of at a different level.

Evolution is very stupid. SGD is smarter. So SGD still can find an intelligence that would kill us. The question is how much it takes.

Genetic algorithm on set of algorithms that, but these algorithms have type signatures such that we know which genetic combinations are valid.

There can be carnivore peptides that eat other peptibes and turn them into themselfs.

It's easier to invent hunger than, “this is genetic fitness, maximize it”.

24.2. Program search 🔗

Let's say that we play minecraft. I want to have an agent that optimizes goal \(G\). \(G\) is the perfect specification that I want. I now do program search to find the frist alg

We perform program search to select the AI, but we cannot actually select all the AI based on running the goal on every possible single Minecraft world and see that it performs correctly in that world. How do we still get the AI that does actually want G instead of not wanting G? It seems like you can't do that. Possibly.

We also assume, now, that we can actually, for all possible Minecraft worlds, run the algorithm, evaluate whether the goal is good, evaluate whether the algorithm is actually good and then still kill you.

The algorithm for intelligence might be pretty simple, but pretty hard to find. Hardcoding a bunch of heuristics is easy to find, but is much more complicated.

If we have a process which finds the easy thing first, something more like SGD, then you would get a system that is not aligned with you, but would look aligned if we look at only a few observations. And the problem is that maybe if we train the system more, and it gets better and better, then it will become at some point so smart that it realizes the situation that it is in and that it has certain goals and then deceptively optimizes for the correct thing or the thing that you want, while not having the goal completely aligned with you, such that the goal would now also not be updated anymore.

Actually, if you could, for literally every single possible wild state, have the AI behave exactly like the goal specifies, then it would actually be aligned. There is no possibility for it not to be aligned, even if it was constantly internally thinking about killing you. That would be fine, because it just always behaves correctly, doesn't matter how it works.

Go on the specification
I cannot evaluate every possible situation in my training.
What constraints does the training put on possible mines produced?
The AI with a goal slot.

Humans can reflect and then cristalize into certain goals. It seems more natuarl to cristalize into “everybody should be happy” than “Everybody should have maximum freedom”.

What I (Johannes) did was to look at my own consious experience. What are my prefferences over my own experience. Then I just extend these preference to every conscious experience that exist.

The mind contains the preferences which could tell you how you like any possible world state probably, but in the form of some algorithm that could probably eventually evaluate some world state, but not in the way where you know it right now.

Humans do feel a certain way if they see a drowning bird. But they cannot conceive of having 10,000 birds drown and how they feel about it. So they need an algorithm in order to extrapolate this feeling. And there are many kinds of algorithms like that. One of them would be like I actually self-modify this so that I don't care at all.

And it is possible to, while not understanding everything, not being omniscient in knowledge, think that you want a certain thing, and then modify yourself such that you actually want that thing, and then your preference would have been like locked into some particular path, and it is possible that had you thought about it for a much longer time, if you actually wanted to have the self-modification, then you wouldn't have done it.

This is a sort of existence proof that you can converge into different paths Even if you start with exactly the same mind and the only difference is how long do you consider performing a particular self modification

You can have an AI with preferences that you like. The preferences are only in a local scope thouh. How can you know that the AI would eventually get the correct value, by generalizing the current ones.

What is scope? With a human, it seems like the scope in some sense is unlimited, because you can't think of any concrete situation, and eventually you will probably be like, I care this much about it, or it's this good. But it is scoped in the sense that it is really easy to know if you're hungry, whereas it's really hard to know for an arbitrary world state where you don't understand what the world state even is, because there's lots of physics you didn't even ever hear about, whether that's good or not.

Music is an example of a physical phenomenon that a human can assign certain values to because when they hear, interact with that physical phenomenon, then it might make them feel a certain way. But it seems very arbitrary. If you have an alien, then maybe they like different music differently. That would sound like noise to us. If you have people who were born deaf and then they got some hearing aids and then they could hear, then music sounds like noise to them. This is an example of a phenomenon that humans care about but there isn't really some instrumental reason that we should definitely care about hearing a major note in like this particular way and it eliciting this specific emotion. You wouldn't expect an alien race to exactly have the exact same response to the exact same music. Whereas you would expect them to have the same mathematics or something.

If you go to an Amazon tribe that don't hear sort of Western music and then you play minor chords, then they will say it doesn't sound sad. Because that seems to be a learned thing somehow, potentially. What kind of music elicits what emotional reaction?

24.3. Village Doctor 🔗

It is possible to start with some preferences that are, like, localized in space, let's say. I care about these people around me. And then you care now, in the same way maybe, extrapolating your preferences, that I care about all these people being healthy on the entire planet. And that now maybe though implies an action such that you would go away from your town and more people would die there, but overall, because you teach more doctors at some school you make, less people would die.

.and so like this goal looks like the outcome after you go away to the doctor would look terrible if you if you just evaluate it based on the previous scope where you only care about the people around you.

The prefferences over the people aronud you haven't changed though. The only thing that changed is the scope.

If you take your new objective that is caring about the entire world and you then run it only on the local scope of the village that you came from, it gives you the same result. But if you now take it also calculated for the rest of the world, then these two things combined imply different policies than before.

Imagine you have a utility function u and now you pick some small state space of the world of Minecraft and you run the utility function over the world state just of this small state space. And then you take the same utility function and run it over the entire world of Minecraft you will get different policies.

The policy changes because the scope changes where we are evaluating the utility function in.

24.4. How to limit scope? 🔗

We have a training procedure for a system. We train the system on some small, localized scope. Then we observe that it is correct. But now, how do we know that it would generalize to all possible states that could have preferences over and always have exactly still the correct preferences that we wanted to have there also?

By default, if we train a system to have preferences over a small scope, it will also have preferences over the larger scope.

The AI on Minecraft cares exactly to behave as my Minecraft assistant. We have a proof of that. But then the AI also cares about a lot tiling the real universe with bananas. And my proof doesn't say anything about that. And therefore the AI will actually try to escape and tile the universe with bananas because it just cares so much about that.

You can take abstractions, combine them into new abstractions, meaning if you have preferences on your lower-level abstractions, you probably can build sort of arbitrary structures and have preferences over them, such that you can, like, anything you can build with your world model you have now preferences automatically over if you have preferences for all the low-level building blocks, and therefore you care about lots of things that are possibly not even in the world, just as humans can imagine, like, situations in their head that are physically impossible.

25. 2024-08-30 🔗

25.1. Draft Deductive World Modeling Game 🔗

I give you the laws of the universe. You give me a “brain” that's good at thinking about this universe.

In the world of logic: I give you a set of axioms. Now every logical sentence is either true or false. Roughly for any true sentence there is a proof, that shows that it is true. To figure out whether a logical sentence \(S\) is true or false we can simply enumerate all proofs until we find a proof for \(S\) or \(\neg S\).

But this is fucking slow. It doesn't qualify as a brain that's good at thinking about this universe.

Given a complete set of physical laws of the world, generate a world model that allows you to efficiently reason about this world.

Given a low-level works module, how can you build a high-level one?

:todo:

25.2. Stub 2024-08-30 - Report 🔗

This article summarizes my work to date (2024-08-30).

Developing the right step graph.

Object World Network
What are Abstractions
Zendo
World Modeling
World state graph

:todo:

25.3. The Power of Propositions 🔗

Every data structure can be represented as a bitstring, and in turn as a propositional sentence \(S\). In \(S\) each variable represents a bit.

Depending on the datastructure we may need additional constraints. E.g. a positive number might be defined in terms of the first bit, which represents the sign of the number being always 0.

Note that we can always collect two distinct datastructures into a 2-tuple.

Once we have a datastructure encoded, we can add assertions about the datastructure. E.g. that some number \(X\) is larger than a number \(Y\). Really anything that can be said about this datastructure can be said in propositional logic (but it might take many clauses).

Once you added these logical assertions to the system we can use a SAT solver to answer questions. E.g. what are possible values of \(X_{5}\), given that \(Y_{4}\), and knowing that \(X < Y\).

26. Mock 🔗

26.1. World 🔗

27. 2024-09-01 - Thomas 🔗

What is intelligence?
What is a World Model
- Desiderata of a world model
  - Allows you to efficiently devise plans that steer the world into particular states
- Properties of a world model
  - Factored
How to build a world model.
- Deductive world modeling game (??, a)
- Inferring zendo
World Model Constructs (Concrete implementations)
- World State Graph
  - Detect patterns in the world through graph analysis
Object World Network
- See (??, a) and
- Glider example
Finding computational model for observations
- aka Solomonoff induction
- Zendo with FSAs
Problems in alignment
- How to steer an inner aligned superintelligence
- How to get inner aligned system
  - What are the mechanics by which optimizers arise from optimization pressure (mesa-optimizers)
- How to make a system want what you want.

28. 2024-09-05 - Matthias G. Mayer - Propositional Logic Function 🔗

[This has been integrated into org.]

Let's encode tick tac toe, such that we can logical reasoning to analyse the game. For example, we want to find a sequence of moves that wins, given a board state. But the encoding should be general, such that we could also ask, what is the sequence of moves to put the most stones on the board, before somebody wins and so on.

We need to represent the board state.

A game has states \(S\), actions \(A\).

There is an update function \(u:S \times A \rightarrow S\) and an initial state \(S_{0}\).

Assume for now that the game always ends in a state \(S_{e}\).

Then we can encode the time evolution in propositional logic.

In particular, the choice of actions fully determine a timeline.

Now we can encode any goal as a propositional formula \(G\).

If we solve SAT with this goal, we get an action sequence that achieves that goal.

Consider now this simple update function with \(S = A = {\mathbb{N}}\).

\[u(s,a) = s + a\]

A game is a tuple \(\left( S,A,u,W_{0} \right)\) where \(S\) is a set of states, \(A\) is a set of actions, \(u:S \times A \rightarrow S\), \(W_{0}\) is the starting state.

A goal is a subset \(G \subseteq S\).

We use zero based indexing.

For a list of actions of actions \(a\) of length \(m \in {\mathbb{N}}\), we define recursively a list \(W\) for \(n \in {\mathbb{N}}_{< m}\) as \(W_{n + 1} = u\left( a_{n},W_{n} \right)\).

We say that the action list \(a\) achieves \(G\), if \(\exists n \in {\mathbb{N}}:W_{n} \in G\).

Let \(W ≔ W_{n}\) be the list of states.

Clearly, \(a\) determines the list of states \(W\) by definition. Now we can encode this whole process of taking a game and an action list to \(W\) as a propositional formula \(\varphi(a)\). Then a SAT solution of \(\varphi(a) \land G\)

More precisely, Encode each \(W_{n}\) as a fixed length bitstring \(B^{n}\). Encode an action as a fixed length bitstring \(A^{n}\).

Let \(\varphi\) be the encoding of the transitionfunction, i.e. \[\forall w,v \in S,\forall a \in A,\varphi(w,a,v) \Leftrightarrow u(w,a) = v\]

Let \(\psi\) encode the goal, i.e. \[\forall w \in S:\left( \psi(w) \Leftrightarrow w \in G \right)\]

Then solving the formula \[\forall n < m:\varphi(B^{n},A^{n},B^{n + 1}) \land \psi(B^{m})\] specifies a list of action ending up in \(G\) in \(m\) steps.

\[\begin{array}{r} S = \left\{ 0,1,2,3 \right\} \\ f(0) = 1 \\ f(1) = 2 \\ f(2) = 3f(3) = 0 \end{array}\]

Let \(S\) be represented by by the binary variables \(X_{1},X_{2}\):

\[\begin{aligned} S_{i} = 0 & \Leftrightarrow \neg X_{1,i} \land \neg X_{2,i} \\ S_{i} = 1 & \Leftrightarrow X_{1,i} \land \neg X_{2,i} \\ S_{i} = 2 & \Leftrightarrow \neg X_{1,i} \land X_{2,i} \\ S_{i} = 3 & \Leftrightarrow X_{1,i} \land X_{2,i} \end{aligned}\]

for all \(i \in \left\{ 1,2,\ldots,m \right\}\). We can now encode \(f\) as:

\[\begin{aligned} P_{1}(i): \Leftrightarrow \neg X_{1,i} \land \neg X_{2,i} & \Rightarrow X_{1,i + 1} \land X_{2,i + 1} \\ P_{2}(i): \Leftrightarrow X_{1,i} \land \neg X_{2,i} & \Rightarrow \neg X_{1,i + 1} \land \neg X_{2,i + 1} \\ P_{3}(i): \Leftrightarrow \neg X_{1,i} \land X_{2,i} & \Rightarrow X_{1,i + 1} \land \neg X_{2,i + 1} \\ P_{4}(i): \Leftrightarrow X_{1,i} \land X_{2,i} & \Rightarrow \neg X_{1,i + 1} \land X_{2,i + 1} \end{aligned}\]

Matthias” encoding: \[\begin{aligned} \psi(X_{1},X_{2},Y_{1},Y_{2}) ≔ \neg X_{1} \land \neg X_{2} & \Rightarrow Y_{1} \land Y_{2} \\ X_{1} \land \neg X_{2} & \Rightarrow \neg Y_{1} \land \neg Y_{2} \\ \neg X_{1} \land X_{2} & \Rightarrow Y_{1} \land \neg Y_{2} \\ X_{1} \land X_{2} & \Rightarrow \neg Y_{1} \land Y_{2} \end{aligned}\]

\(\Psi(i) ≔ \psi(X_{1,i},X_{2,i},X_{1,i + 1},X_{2,i + 1})\)

proposition := AND(PSI(i) for i in range(m))

Because we just have the deterministic function \(f\) there is only one possible variable assignment.

We care about this because we want to play FSA Zendo. We encode what an FSA is in propositional logic. We then want to use this factual knowledge to infer theorems. Theorems which would be helpful in infering the what the FSA

Note that this would not work immediately to solve theorems in general. In Zendo we can formally specify what makes a theorem useful: Does this make you take less observations/computation/memory/etc. to infer the correct FSA?

29. A Simple World-Model Model 🔗

29.1. [O] Part 1 🔗

I first had the idea that if you could link the "subcomponents" of the current world state and the model that represents the current world state.

Figure 1: C.0 Detecting Errors via simultanious Substructure Intervention

This seems a generally useful tool: It allows you to not only detect that your world state is erronious, but also where exactly the error is. The error must be related to the concrete variable change we made.

E.g. doing the above with most variables of the world could leave the model and the world in sync after applying the respective update functions. But for one variable there is a mistake. We now narrowed down the possible errors to (non-comphrehensive):

This specific substate is not a faithful represenation.
The update function does not update this particular substate currectly.

Before we could maybe detect that the model and the actual world state are out of sync. But now we can detect what specific sub parts are out of sync.

This seems especially useful for updating e.g. a lisp programs which is not differentiable, that describes the update rule in our model.

As the first step I created a high level formalism. Green underline means that a term is not rigourously defined. As a first, I tried to avoid formalizing these intentianally to be able to move faster. This worked very well.

Note that you could also frame it as inspecting the list of actions that you know how to perform, then selection the action that you predict will change (or sequence of actions), the attribute of the world that you want to verify, is correctly modeled in your model of the world. Note how this takes into account more information. You could be wrong about how an action changes the world, and if it is, this experiment might also raise an alarm.

The main idea here is that given that The Epistemics of Sensory Observations is true, then we can verify higher level abstractions by comparing them with the lower ones.

The above basically outlines a system architecture and desiderata on the components of the architecture.

The next step was to identify and describe the concrete errors could show up in the various components.

In my journal I then proceded to analyse how given the assumption that the model is perfect except for one small error in only one of the components I would be able to use the other components to detect and correct this error. (The below contains all red journal entries wich also contain additional thoughts).

I also noticed that to fix same function like G, D, or P, even in this idealized scenario, I don't know how able to fix it, if they where anything complicated. I therefore thought about how we might do a special kind of program inference. Specifiaclyl how to update an almost correct program, which is very different from solomonoff induction.

I also thoughts how logic programming might help with this program search.

In retrospect I might have spend to much time thinking about how to update a program. Focusing on understanding the high level details of the model, especially the link function, seem better. I would like to first have a good understanding of how to detect and correct errors, and probably have a small toy implementation where functions like G are just maps.

That again feels like it would be too easy, but emperiaclly talking to the machine spirit very good.

29.1.0.1. [O] Work Report 🔗

I first wrote a meta reflection about how the work went. Most importantly working with analog tools seemed extremely good at keeping me on track.

For more see the daily note from 2024-10-29.

29.2. [O] Part 2 🔗

29.2.1. [O] Overview 🔗

Today I figured out a simple setup of "homing into" where a particular problem appears. See the Code: Identifying Erronious Objects in a simple Map World for the main result on that.

I now have a concrete idea to try out in code of how to do program updating, using simple NAND gate programs (see here). There are many different open problems. The most important first step seems to be to create a scaffold around these problems that guides our effords, as described here.

29.2.2. Code: Identifying Erronious Objects in a simple Map World 🔗

Let's start by writing down all the objects we need.

(def states-wold)
(def states-model)
(def states-world-current)
(def states-model-current)
(def world-update-function)
(def model-update-function)
(def model-generate)
(def model-down-translate)
(def link)

Let's first focus on a simple task. Given the current world state, and a model state, create a function that can check if for a particular object in the current model state, that object is correctly representing the object it is supposed to represent.

(def state-world-current [{:id :w0
                            :position [0 2]
                            :velocity [0 -1]}
                           {:id :w1
                            :position [0 0]
                            :velocity [2 2]}])
(def state-model-current [{:id :m0
                            :link :w0
                            :position [0 2]
                            :velocity [0 -1]}
                           {:id :m1
                            :link :w1
                            :position [0 1]
                            :velocity [2 2]}])

(defn link-get-target
  "Given an object in the model state, and the world state, return the curresponding world-object."
  [state-world-current model-state-object]
  (let [link-targets (filter #(= (:id %)
                                (:link model-state-object))
                             state-world-current)]
    (assert (<= (count link-targets) 1) (str "More than one link target found. Matches: " link-targets))
    (if (= (count link-targets) 0)
      nil
      (first link-targets))))


(link-get-target state-world-current (first state-model-current)) ; => {:id :w0, :position [0 2], :velocity [0 -1]}

(defn correct?
  "Given the world state, check if the model-object correctly represents the world-object, it is supposed to represent."
  [state-world-current model-state-object]
  (let [link-target (link-get-target state-world-current model-state-object)]
    (println link-target)
    (println model-state-object)
    (and (= (:position link-target)
            (:position model-state-object))
         (= (:velocity link-target)
            (:velocity model-state-object)))))

(map (partial correct? state-world-current) state-model-current) ; => (true false)

This demonstrates how given that we already have a linking structure between the model state and the world state, we can detect which objects are incorrect. This allows us to zoom into where an error occours. Here we can determine what object is wrong, instead of just that the model is wrong.

This setup is quite hardcoded. In general we want a system that can execute this homing procedure recursively, to zoom in as much as possible on where the error occurs, without comparing the entire world state. This makes sense given the assumption that comparisons at higher level can be made cheaper, e.g. through hashing.

29.2.3. [O] Analog Capture 🔗

See the the 2024-10-30 daily note for some more thoughts on:

How to figure out how to update programs by introspecting us fixing a software bug.
Recursive Hash trees
Misc

29.2.4. [O] Work Report 🔗

There are a couple of important insights today:

Using persp-mode in emacs is extremely good for making my non-distracted, without having any cost associated with using it basically.
Sticking to analog tools like yesterday still worked great!
Analog tools seem to make me more deliberate with what I work on, in a noticable beneficial way.
Today I managed (likely in part do to being more deliberate) manage to implement something! This is literally the first time I managed to do this (actually finishing it).
Setting up a target in advance but then also contiunally refining it while I am working on it, seems extremely useful (the previous point's success can be attributed to this in significant part I think).
It is useful to "plan" in the form of finding a good arangement of tasks/requirements, that guides you to fullfilling them.

29.2.4.1. Task Maps 🔗

Let me say more about the last point. It's not about what to do, but identifying the general structure of how things need to be done (such that if they would be done in another way it would not make sense or would be much worse). It's not about dictating what should be done, but it is about building a map of reality that you can use such that you don't unneccesairly run into walls, hurting your nose.

So a plan doesn't neccesairly say "do X first" if Y could also come first. But it might say "doing X will give lot's of information about how to do Y correctly", or "Z requires X to work at all". It contains information you are pretty sure about (and if you are not sure it needs to be explicitly mapped as such).

A taskmap also tells you things about how the task map needs to be. Tasks soon to be targeted benefit from more precision and more details drawn on the map. You don't need where a street in a city is until you are reaching the city.

A task map should be maximally flexible (not rigid). It is about capturing all possible ways that you could do things, not describing a particular way to do things.

This is so different that I don't want to call it a plan. Let's call it a Task Map.

29.2.4.2. [O] Journal Entries 🔗

30. Program Search is the Core 🔗

Today I did only meta thinking, evaluating the overall appreach.

30.1. The Universal Code-Interpreter Framing 🔗

I realized that there is something very fundamental about the structure of having a language and an interpreter:

I ask what kind of structure the update function (of the world state) in our world model could have. Maybe there is something that is not source code? I was unable to come up with anything that seemed qualitatively different from the language interpreter structure. Of cause ultimately we'd like our world program to not be arbitrary source code. But it will still be source code of a form (given the definition above). It seems like there isn't an alternative.

This gives me confidence that tackeling the problem of arbitrary program inference is the rigth direction. The core challenge is to figure out how to do it at all efficiently. Once that is figured out, we can focus on crafting a carefully constrained language. This seems good because in a sense I realized that there is basically nothing else to do. The world model update function must take the form of a language and an interpreter (there might be equivalent isomorphic framings).

In summary: Solving program inference is the right direction, because parts of the world model must be expressed as a program. In some form we need to solve this problem.

30.2. The Programs inside a Scientist 🔗

Now that we are talking about program inference we need to be very careful, with what program we are talking about in a given context. We have:

Programs making up the description of the world. E.g an update function .
Programs that infer programs of type . E.g. let be the procedure that updates based on new sensory observation, if mispredicts the actually occuring sensory input.
Programs constructing programs of type . E.g. SGD is a general optimization procedure that we use in combination to neural networks to "find minds".

We want to avoid constructing a system that uses programs of type 3. We want to figure out how to create an understandable program of type 2. We current don't need to prioritize making programs of type 1 interetable. The bottleneck is figuring out how to do anything at all.

30.3. Language Translation and World Models 🔗

30.4. Capabilities are always Dangerous 🔗

If your system is competent it's dangerous.

somehow I haden't internalized this appropriately before. I think I have often in the past fallen into the trap of arguing that a particular algorithm would be save, or have good properties, based on some condition. And usually I failed to consider this basic point, that if it actually worked, it would be dangerous.

It dosn't matter what kind of high level argument I am making for how this could make the system save, or more save. If the system actually works it will kill you, untill you can give a very precise technical argument for why it won't. And I never had such an argument.

I expect the only way we will be able to give such rigorous arguments, is by really understanding the details of how the system works that we are buliding.

Also consider that mechanistic interpretability is really about doing a kind of program inference, where we start with a spagetty program, and try to tidy it up. This is even more evidence that program inference is good. One version of program inference is about infering a nice program that describes the behavior of a neural network. To extract it's algorithms in a nice format. Figuring out parts of the general program inference problem seems useful for that, and really if we could extract capabilities of NNs into a understandable format, we can use these as buliding blocks (with caution).

Put it seems probable enough that it is much easier to make a general program inferer/updater, that is understandable from scratch, that this would be good to try first.

30.5. Mechanistic Interpetability and The Universal Code-Interpreter Framing 🔗

30.6. A simple world model as a diagram 🔗

I drew the above diagram. The nice thing about it is that it contains all the function, and objects that I want to talk about, and all the important information about how they relate to each other.

This diagram is similar than the predictive processing diagrams I made in the past. It seems easy to redraw and might serve as a useful visual representation of the setup in the future.

The relation to category breaks somewhat once we consider having a higher level abstract world model.

31. Continuous NAND Circuits ATTACH 🔗

Daily Note

We primarly worked on implementing C.11 (NAND world model updating).

If we implemented name:WBB:C.0 we would be able to to efficiently verify hypothesis:

It's an open question how to generate the hypothesis (which seems where most of the difficulty lies).

Any function for can be implemented as a NAND circuit:

31.1. Continuous NAND gate 🔗

If we build a continuous NAND gate, then we can run SGD on a circuit of such continuous NAND gates to find the right circuit. The following shows a continuous NAND gate.

We want to make the loss such that gradient descent will converge onto parameter values such that the continuous nand gate can be transformed into using discrete NAND gates, because the parameters are exactly such for each continuous NAND gate X such that X behaves the same as a discrete one.

31.2. [Sidetrack] Baricentric Coordinate Optimization 🔗

There seems to be something very elegant abotu baricentric coordinate. In a sense they encode the constraint that all values must sum to one. It feels like you could make an elegant optimization procedure, such that softmax would seem like a hack.

Here is a simple untested algorithm:

Here is an idea for a loss such that SGD would go to a corner in the simplex of the baricentric coordinate system.

31.3. Some Potential Directions 🔗

31.4. Generating NAND circuits ATTACH 🔗

How can we generate NAND circuits? If we have for each pair of NAND gates in the previous layer a NAND gate in the current layer, then we get an exponential explosion. Ideally we want a generic procedure that generates NAND gates that are as small as possible, while still containing the circuits that we are looking fore.

31.4.1. Stochasticly generated NAND circuits 🔗

We could stochhastically generate a NAND circuit by randomly selecting nodes from the previous layer to connect to a new NAND gate, until all nodes in the previous layer have at least one outgoing connection.

All information can flow between two layers when: Let be a layer of NAND gates in the NAND circuit. Let be the layer after . If you can delete for each NAND gate in an input edge, such that you are left with a "one-to-one" mapping of the NAND gates of to the NAND gates of , then all the information can be passed. Just consider the case where the containuous-id-nand-gate has parameters set such that it would pass the ID of the input that wasn't deleted. Now outputs the output of , though potentially in a different "order".

If we now add one more nand gate

31.4.2. Fully Connected NAND circuit Iteration 🔗

We could generate a circuit by creating a new type of nand gate. Let be the current circuit. We add a new nand gate . has matrix of parametrs of dimension , meaning twice as many parameters as there are nodes. To compute the nand gate, we first compute the input. Assume each edge has a value associated with it already. Now put all these values into a vector, and dot product it with . This gives a vector of dimension 2, where the first is the X input, and the second is the Y input to the NAND gate.

31.4.3. Omega NAND Circuit Iteration 🔗

The basic idea here is that we can iterate both in the depth and the width of the circuit at the same time. Conditional on name:WBB:C.28, finding the right depth-width tradeoff parameter could speed up the search in general.

31.5. Pinpoint search Program Search with Intermediate Values 🔗

Here a general algorithm is presented for guiding a search process by specifying several intermediate values (they need not be ordered).

31.6. Targetless Compression Search 🔗

31.7. Some Reflection 🔗

31.7.1. The Smallest NAND gate layer that preserves Information 🔗

Here is an important point that should be taking into account when considering how to generate NAND circuits. Note that this was generated after the generation methods provided afterwards.

31.8. Convexity Testing 🔗

It would be great if our NAND circuits where convex. Then we could use fast convex optimization algorithms.

We have implemented the simple 2 point MC convexity test algorithm in the NAND gate project repo.

The basic NAND gate is convex. The continuous NAND gate is non convex. Somehow the convexity test said that the loss is convex but this seems very strange and probably wrong. We did not try to understand what is going on here.

31.9. Aborting Continuous NAND circuits 🔗

The current implementation of the NAND circuits doesn't work. It is likely because the optimization get's stuck at a local minima. But we have not confirmed this!

Usually it's very good to understand why something doesn't work. However, here we are shelving this project, because we noticed that we failed at doing research by not carefully reasoning through the problem. See the notes on Research as a Stochhastic Decision Process, for the reasoning of how we failed.

32. Taking a step back and understanding Program Search 🔗

32.1. Program Information Extraction Mechanisms 🔗

32.1.1. Iterative Assertion Programming 🔗

Assertions "link" to where your assumptions are violated. They also tell you what the violating value is.
You can also keep the original input or even a trace of values through the program, that lead to the assertion.
Assertive programming allows you to write a partial implementation of some function (e.g. partial description of the world), and assert that certain conditions must hold. If they break then your code breaks. But instead of your code breaking silently (returning incorrect values), there is now an assertion rased. This assertion could now be passed to the program generator (world model inferer), such that it knows where it needs to add more code.

32.1.2. Unit Test Propagation 🔗

When we have the update function in our world model, and it produces the wrong next world state we want to be able to figure out where exactly the update function is wrong. Also assume that we have an accurate description of the current world state.

When we notice that we fail on some input, we get a unit text for free. However this unit test is on the outermost layer. We want to propagate it to the lower levels, such that we have the right unit tests for specific functions such that if they would hold, our program would also be correct.

TODO nan example

32.1.3. Partial Evaluation 🔗

You could partially evaluate incomplete code, and reduce it as much as possible. It feels like some version of this would be really useful for interactive programming, because you can now run your function without suplying all the arguments.

A type system can be layered ontop of this, such that we can propagate also the types through the values, spotting inconsistencies.

We can think about it as propagating the properties that we know about each code fragment, through the program, to attach new knowledge to other code fragments. E.g. if a function returns the nan constant, then I know that all other numerical operations that take the return value of this function as input, will also return nan. Therefore I can actually calculate the program at compile time, because for that special input, I know the value.

You can do this for any constant value, such a string literal, on which we select a substring and then trim it.

This works only with pure functions (or with functions we make pure, e.g. by removing the effectful code like printing).

32.2. Examples of Almost correct code 🔗

Let's generate code that satisfies some specification. Then let's make the code slightly wrong. The goal is to generate concrete examples of programs that are only slightly wrong.

We want to use these programs for designing an algorithm that can detect and fix these small mistakes.

32.2.1. 2 meansures of how wrong a function is 🔗

A function can be almost correct because:

A small change to the code makes it correct.
The function gives the right output for most inputs.

Initially we want to make programs that have both properties, and think about what would an algorithm that fixes the mistakes.

What we have:

We have 2 examples.
quicksort spec not tested but implemented.
High level idea how to fix constants.

32.2.2. Wrong Constants 🔗

A constant is wrong. (After the 1 is corrected to a 0, removing the "+ 0" part is probably simple).

(defn id [x]
  (+ x 1))

(assert (forall x : Int
                (= (id x)
                   x)))

(defn id [x]
  (+ x 0))

This could be solved by making the 1 a constant c and searching with logic programming for what that constant could be for the unit test to pass:

(defn id [x]
  (+ x c))

(defn sort [a b]

(defn increment [x]
  (+ x 2))

(assert (forall x : Int
                (= (increment x)
                   (+ x 1))))

32.2.3. Quicksort 🔗

(defn quicksort-correct [xs]
  (cond (= (count xs) 0) xs
        (= (count xs) 1) xs
        :else
        (let* [pivot (first xs)
               smaller (filter (fn [x] (< pivot x)) (rest xs))
               larger (filter (fn [x] (>= pivot x)) (rest xs))]
          (concat (quicksort-correct smaller) [pivot] (quicksort-correct larger)))))

(quicksort-correct [1 5 3 45 5 343 2 3]) ; =>(343 45 5 5 3 3 2 1)

(defn quicksort-incorrect-1 [xs]
  "The error only happens for some lists."
  (cond (= (count xs) 0) xs
        (= (count xs) 1) xs
        :else
        (let* [pivot (first xs)
               smaller (filter (fn [x] (< pivot x)) (rest xs))
               larger (filter (fn [x] (> pivot x)) (rest xs))] ; >= doesn't select elements equivalent to pivot
          (concat (quicksort-correct smaller) [pivot] (quicksort-correct larger)))))

(quicksort-incorrect-1 [1 5 3 45 5 343 2 3]) ; => (343 45 5 5 3 3 2 1)
                                     ;; expected: (343 45 5 5 3 3 2 1)

(quicksort-incorrect-1 [1 1]) ; => (1) expected: (1 1)

(defn is-sorted? [input-list]
  (for [i (- (range (count input-list) 1))]
    (<= (nth i input-list) (nth (+ i 1) input-list))))

(defn same-elements? [input-list sorted-list]
  (for [x  (concat xs sorted)]
    (= (count x xs)
       (count x sorted))))

(defn correct-sorting-alg? [f input-list]
  (and (is-sorted (f input-list))
       (same-elements? (f input-list)
                       input-list)))

32.3. Alebra on Programs 🔗

One important technique for program search might be to define some algebraic structure on the code that you are searching for, to enable certain kinds of mechanical reasoning about the programs. Things that seem relevant here is functional level programming (John Backus, the J programming).

32.4. Making a model of Program Search 🔗

{:formal-language X :interpreter Y} seems an important object. But we don't even have a name for it. We need to better understand program search.

Some parts of the world model needs to be expressed in a code, that will run by some interpreter. E.g. the update function. The code captures a pattern, that is then dynamically unrolled.

To figure out how to generate this part of the world model we need some form of program search. What else could it be?

Now we are at the point where we don't have good models that would tell us what to do next.

32.4.1. What are different forms of program search? 🔗

Brute-force program search (iterate all programs and check if they are the correct one).
evalo (logic programing)
Neural Networks
SVM
ML in general (what is called model is actually a program)
- k-NN is not program search

32.4.2. The Semantics of a Program 🔗

We spend some time analysing semantics of programs. Specifically denotational, axiomatic, and operational semantics.

It feels like the bottleneck in program search is figuring out how we can tell the computer the right semantics. When you write a computer program you need to be very precise, because otherwise the computer can't do what you want. In some sense the computer understands the program, while in another it doesn't at all.

Saying the computer understands the program because it can execute is extremely misleading. While the CPU runs and fetches instruction there is no "intelligence" going on. There is no "understanding" of what the computer is doing happening in the computer as it runs this code.

A core bottleneck in program search seems to be, to figure out how to make the computer "understand" the programs it is matipulating while it tries to understand how to construct the correct program (e.g. according to some specification).

32.4.2.1. Semantics 🔗

Semantics (CS) are about specifying how a string in a formal language should behave when executed. The formal language here is all the valid source codes of a programming language.

This is a higher level abstraction than a concrete interpreter. E.g. we can use logic to create a specification of the input output behavior of an interpreter.

32.4.2.2. Denotational Semantics 🔗

Given a formal language , we assign to all symbols in the language meaning via a map from a symbol to a mathematical object. E.g. for a string in the formal language (the source code), it is sufficient to say that is the mathematical addition function, and is the number 1. Note that these semantic assignments compose, such that they give a meaning to .

Denotational semantics now assigns a concrete result of computing by maximally reducing the mathematical object. In this case reduces to . Therefore is the result of the program.

So generally we assign meaning to a language by specifying a mathematical translation for each symbol that can appear in a program, such that these meanings compose together such that every string in the formal language now has a meaning. The result of a program is the maximally reduced mathematical object.

32.4.2.3. Axiomatic Semantics (specifically, Hoare semantics) 🔗

We have a programming language with instructions such as if and while and assignment, x := 3.

Now, for each such instruction, we write down what this implies in terms of logic statements made about the state of the program.

Such a logic statement could be for example, . If we had the instruction x := 5, the this statement would become true.

For each instruction, we have a Hoare triple: . This says that if the logic statements were true before the instruction, then the statements are true afterwards. So, refers to preconditions and refers to postconditions. And refers to the instruction.

This is very reminiscent of STRIPS: https://en.wikipedia.org/wiki/Stanford_Research_Institute_Problem_Solver

Here is the Hoare triple for if:

{:instruction S :precondition '(and B Q) :postconditions '(Q)}

So, if and are instructions that have the same postcondition, but their preconditions differ in that requires and requires , then we can combine them in an if statement.

It feels like hoare logic would allow you to compute what inputs are valid inputs, based on some properties of your outputs that you want, and some generic properties that hold for program components.

32.5. [Aborting] figuring out General Program Search 🔗

In Program Search is the Core we argued that really program search is the core thing that we need to figure out to understand world models. This might be true. But we jumped the gun. There are many things we do not understand about world models that seem very important to figure out. We basically failed to think hard and long enough at the right level of abstraction and instead just tried to solve some problem.

E.g. in Back to the basics of World Models we partially cash roam:Factored%20World%20Models, a thing we knew about for over a year, in the context of the update function of a world model. They seem very powerful and helpful for future work on program search.

33. Back to the basics of World Models 🔗

What do we actually want?

Here is a simple formalism that captures some important part of the problem we are trying to solve.

33.1. Factoring the Update Function of a World Model 🔗

How can we build a model here? Well consider that there would be some structure in the world.

If we had all these observations that highlight a specific part of the update function that is independent of other parts of the update function we'd have a factored update function, that even when implemented inefficiently (e.g. with a lookup table), can represent many world states.

But with the method outlined in name:WBB:C.41

34. A World to learn a Factored Update Function in 🔗

So the goal now is to design an algorithm to infer a factored update function.

I started by thinking about how a toy example would need to look like. I first tried to make it as abstract as possible. The setup I came up with assumes:

There is a notion of space.
A notion of objects (each having a position in space).
That the update rule is
- Symetric - At every position in the space the same update rule is applied.
- Local - Objects only interact with the objects near them (over one timestep).

Those last two I got from roam:atoms%20to%20agents.

By thinking a bit about mathematical convolution I rearived at the idea of detecting changes by calculating the differences between the current world state, and the world we get after applying the update function.

34.1. Cellular Automata Rule Learning 🔗

Now the next step is to make the setup more concrete. I ended up choosing cellular automata for this.

Specifically starting by looking at the game of life. First I tried to make a puzzle setup:

Then I tried to manually find the rules for the game of life based on some observation I am making according to this puzzle setup:

Then I tried to turn the procudure that I executed into a bullet argorithm:

One nice thing about the setup so far is that it should be pretty straightforward to implement it. I immediately thought about how to do it for n-dimensions.

I then started to implement it using core.matrix in clojure.

34.2. Cellular Automata Simulator 🔗

(require '[clojure.core.matrix :as m])
(require '[clojure.math.combinatorics :as c])

(defn offsets [dimension]
  "Returns the indicies for the neighbors of a cell in an n-D array.

  This doesn't return the coordinate of the cell itself. It returns
  diagonal neighbors."
  (->> [-1 0 1]
      (repeat dimension)
      (apply c/cartesian-product)
      (filter (partial not-every? zero?))))


(defn neighbor-array [array]
  (let offsets))

(offsets 3)



(def matrix [[1 2 3] [3 4 4]])

(m/emap #(+ 1 %) matrix)

(m/mget matrix 1 1)

(apply c/cartesian-product (map range (m/shape matrix)))

34.3. Work Reflection 🔗

After I had already started to implement a cellular automata simulator, I noticed that it was pretty fun, and that I didn't want to reflect about whether it is actually the best thing to do:

35. Preventing self-modication is Hard 🔗

Notes about why preventing self-modification is hard in an AI agent. From Robert Stroi's interview for AISC-2025.

If an agent is aligned with our current goals it could become misaligned, if our prefereces are such that they make us execute self modifications, such that our goal changes, such that these new goals are now in conflict with the previous goals.
Have an agent where you blacklist all the bad things (guidelines). Guidelines are a filter on the actions you can take. If unsure if a guideline will be violated then run the simulation of the world further in the future to get potentially more certain. Only do an action if you are sure it will fail
When you have an AI that selfmodifies, you want to restrict what are the possible self modificanios. If there is a score functions it should be able to improve it, but the new code the should be e.g. "exec("new_AI.py").
It's hard to tell an AI not to selfmodify, because pressing keys on the keyboard could be used to modify source code, but these atomic actions do not directly link up in a derectable way to "the AI is modifying it's source code".
- The agent needs to really represent it's "intention" of what it want's to do, and then needs to based on that model not do it.
- If we have an AI that has multiple levels of thinking it's very hard to prevent this.
- So the AI needs to understand what is going on with the low level generated plan, such that the guidelines can again be applied.
- However this does really work if the AI creates or has muliple models to think about reality. Especially if it creates new models. The guidelines might only talk explictly about one model of reality.
  - There could be a physics simulation of gears and stuff, that implement an analog computer that computes a malicious AI.
- The AI needs to evaluate if the current model/ontology that it is thinking in might be about something that would be filtered in a different model/ontology.
  - If all plans on a high conceptual level are rejected, and the AI starts immediately thinking at, e.g. the atomic level, then it needs to translate them back up to the conceptual level (into possible multiple types of models) and then evaluate the Guidelines in them.
  - Overall we need to make the AI optimize to figure out if the guidelines are violated.
- We can conceptualize a recusive problem solving agent as many agents. Each one of them could be misalgened. Perhaps the higher level agents could understand and evaluate what the lower level agents are doing.
- All this seems to be about having the right kind of world model structure.

Roberts original notes (above notes are not directly based on them).

36. Orca Intelligence 🔗

Motivation:

They have a language and coordinate hunting.
- call for
Orcas train their kids.
- Orcas create props of seaweat putting them on land to train their kinds to hunt on land.

They coordinate hunting. https://www.youtube.com/watch?v=fs8ZveNZQ8g

Understanding orca language seems good to do, even when you want to construct a synthetic language to talk to them. You want to make the language as easy as possible for them to learn and use.

Get a rich context in which orcas are talking to study how their talking relates to their behavior. E.g. construct a 3D scene of the orcas locations, and what they are doing (what strategies they are doing), and what orca is saying what.

Check for patterns between what is being said and their behavior. Things to look out for:

What pices of language repeats exactly?
Do individual orcas have names? E.g. is there a construct of COMMAND NAME, that causes the orca with the name to do a specific thing?
Can you train a neural network learn to predect orca behavior based on context and language? If yes there must be structure there.
Make the network have a human understanable representation of it's knowledge about orca language.

37. "Goal Systems" can be complex 🔗

Asking "what is the goal of this system" isn't neccesairly a coherent question. E.g. humans can have many conflicting goals. I want to be healthy, but at the same time I want to eat this icecream infront of me.

We shouldn't expect that by default SGD will find a very simple "goal system".

38. Examples of aligned systems 🔗

Firefighters risk their lifes to put out fires and pull out people out of burning buildigs. I don't expect they are payed that well for that risk. So to be a firefighter probably involves actually caring about helping people.
A person that presses a button that kills them but prevents the sun from exploding.
A rocket is aligned with "trying" to make the airplane, that I want to explode, explode.
A mother is aligned with the child in that it tries to protect their children.

39. Sharing Goals 🔗

Claim: Two systems are aligned, exactly when they have the same Goal. E.g. rocket and me "want" (take actions to make true) the airplane explode.

This is totally wrong. If I want to put a banana in my mouth, you wanting to put a banana into your mouth isn't makeing a bana be in my mouth. However, we are using the special variable "I" and "my" as in "I want to put a banana into my mouth". These actually point to different things in the world.

40. Ways to frame goals 🔗

There are multiple ways to think about goals:

The goal of a system is what the system behaviorally seems to be maximizing. E.g. I eat a banana, means that I have the goal of tasing banana in my mouth.
You take into account what the system intents to happen. E.g. I cut the wrong wire, and the bomb explode. Maybe I could have realized by looking at the bomb that it was the wrong wire, but I was too dumb/incompetent to do so. But actually me goal was that the bomb is defused.
The reflective goal might be very different of what you want now.

41. Goal can Conflict 🔗

It is possible to have multiple goals, such that not all goals can achive maximal value. E.g. I want tasty icecream, but also want to eat healthy. There is a problem here because the goals can't be easily compared (they're incomensurable).

If instead we had two utility functions and , then we could decide how much we care about each of these utilities, and encode them into constants and , and construct a combined utility function U := (fn [x] (+ (* a U_1) (* b U_2))).

42. Converging Turning Machines 🔗

Definition: A cell on the band of a turing machine converges to a symbol if there exists such if has ran for timesteps, forall timesteps with have the same value of . Note that it is ok for the turing machine to write a symbol to the cell. It just needs to always be the same symbol as is already in the cell.

Definition: The tape of a turning machine is said to converge to a a configuration , if all cells converge to a value.

Definition: A cell is said to converge into a cycle, if the cell value, start from some timestep , can be described as cycling through of a vector of finite length. E.g. if a cell has the following sequence of values [0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 …] going on like this forever, its the same as (cycle [0 0 1 0 1]).

43. Controlled Generation: pruning without generating and then filtering possibilities 🔗

As a motivating example compare the randomly-sample-range-naive and randomly-sample-range.

 1: (defn randomly-sample-range-naive [a b array]
 2:   (let [idx (rand-int (count array))]
 3:     (if (or (< idx a)
 4:             (>= idx b))
 5:       (recur a b array)
 6:       (get array idx))))
 7: 
 8: (repeatedly 20
 9:             #(randomly-sample-range-naive 1 4 [10 20 30 40 50 60 70]))
10: ;; => (30 30 20 20 40 40 30 20 40 20 40 40 40 40 40 30 40 20 30 20)
11: 
12: (defn randomly-sample-range
13:   "Randomly sample an element from `array` with index i, s.t. b > i >= a."
14:   [a b array]
15:   (let [index (+ (rand-int (- b a))
16:                  a)]
17:     (get array index)))
18: 
19: (repeatedly 20
20:             #(randomly-sample-range 1 4 [10 20 30 40 50 60 70]))
21: ;; => (3 3 2 2 2 2 3 4 2 4 2 3 2 4 2 3 3 3 4 2)

randomly-sample-range-naive takes in the worst case, whereas randomly-sample-range always takes O(1). randomly-sample-range-naive generates possibilities and then prunes unsuitable ones, whereas randomly-sample-range is guaranteed to generate a correct value "on the first try".

Generalizing the above a bit, we can perform controlled generation for sampeling by creating a prodedure that will only ever output indicies of element we want to get.

The filtering approach can be made much more efficient by encoding the problem in a tree structure. When we filter a node, we also automatically filter all child nodes. This is called pruning.

Naive pruning may be less efficient than controlled generation, because we still need to evaluate the node that will be pruned. Controlled generation would be about jumping ahead in the tree traversal, such that we do not need to evaluate some number of nodes.

In general controlled generation is about only generating things you want (you never generate something and then reject it). Of cause controlled generation can be combined with filtering.

Motivation: There is an enourmous number ideas I could have. But the human brain doesn't even generate most. It immediately generate good ones.

43.1. Scratch 🔗

This is very different from evolutionary algorithms which progresses by generating random things and then pfiltering aeftrwards

Even if for every thing that moves, it is not in a contiguous region, but a single index, it's still not super slow in some sense, because you can write an algorithm programmatically that corresponds to skipping exactly these indices, and that would be expensive once, but afterwards would be cheap, and probably cheaper than creating a new array with the moving things filtered.

43.2. Reflection 🔗

The example was very useful.
- The example is very simple, and not something that seems natural to think of when thinking about human cognition (which is what I started with).
- We identified an important algorithm and then tried to figure it out in concrete details.
Examples are hard to come up with
- When having an abstract concept like "humans generate not extremely dumb things" it's unnatural by default to think "that is like accessing an array via procedure than implicitly filters by not generating certain indicies". That's a big step to take. You need to move closer gradually such that you can have this idea. E.g. by first thinking about where in computer science things are generated "Ah iterators".
- Seems trainable.
Johannes often fail by compartementalization. E.g. I didn't realize that game dev is useful until now for research.
- Compartementalization might prevent you coming up with good examples.
Eliezers thinks of brains as engines that do computations.

43.2.1. TODO Generating examples seems extremenly important. 🔗

Somehow people only know how to go from concrete to abstract but not from abstract to concrete.
Richard feynman was good at generating examples by doing computations in his head, from which he then could generate the general pattern.
From an algorithmic perspecitve examples are great because once you get a more concrete setup with filled in values you can execute the algorithm. If you have a concrete example you can use your knowledge about the concrete objects in the example to predict what will happen which generates more data that will allow you to extract the general pattern.
- Example: I want to get a proof that some function is convex. I am stuck. I am putting in different concrete values into the function and see how it behaves. I try to recognize patterns, and then extract them in the form of properties that I can now work with to complete my proof.
Examples provide feedback
- If you can't generate an example you likely haven't understood the thing sufficiently.
When you first can't think of an example, and then you think and manage to generate one, it means that you must have expanded your understanding.
It's important to give many qualitatively different examples. That makes it easier to find the common patterns.
Examples can be generated in different ways. Examples usually have the property that they are a "possible world":
- Describe actual observations that happened in the past.
- Generate fictional scenarios that satisfy the constraints.
- A counterexample that describes something that is possible but violates the constraints.
- A contradition example, that doescribes an impossible world, and points out why it is impossible. E.g. it's impossible for (apply + [4 4]) = 5, because 4 + 4 = 8. This could be used to show that some constraints don't make sense. Also you can say that some question like "why is (apply + [4 4]) equal to 5" doesn't make sense.

Before you can give examples you need to bound the topic. E.g. let's understand how human

43.2.2. MAYBE How to get feedback from your ideas 🔗

Feedback is important. You want to maximize the frequence and information content of feedback. Trying a random combination for a combination lock gives little feedback. Performing the experiment of adding 4g of sugar to a petri dish with bacteria gives you a lot of feedback, because you can analyze exactly what happens with a microscope.

Analyze failed ideas and understand why they failed. This allows you to bias the generating distribution.
- I notice that the agenda of using SGD to generate a NAND circuit doesn't work. I think about how I got to doing something that doesn't work. I realize that it was because I am just doing the first thing that is exciting without reflecting and writing down motivation for why it is good and weighing this topic against other things I could be doing. I am now more likely to not just follow my first hunch because I noticed that this can make me fail.
? Try to generate examples
? Write code
- Test the Code
  - I think it will work. I write tests. It doesn't work, or my model of the algorithm was wrong, because it does work but the tests fail.
? Formalize into math
Explain to other people
- I explain the WebGPU thing to Yannick and he questions why this is a good thing to do. It forces me to evaluate why it makes sense, ensuring that it is actually good to do (see Motivation for learning Computer Graphics).
? Explain tosomeone not familar with the jargon
? Explain to bruber duck
? Write spimle blog post

44. Learning Morphology Through Play: algorithms for building models of your body that allow you to efficently traverse the worlds state space 🔗

We want to understand world modeling. The most basic form of world modeling is to learn how to control your body. Having a model of your body is required for performing any task. This is potentially an unsolved problem (further research needed). There is an evolutionary algorithm to learn a starfish robot body, but I am talking about knowing the explicit algorithm for building up the model, more like a human would build it.

The goal of the project is to create very simple games with basic control tasks. Each time the game is randomized:

The action indicies are permuted (e.g. W now moves you left instead of forward)
The control sheme is changed (e.g. tank controls vs mouse look vs setting a direction via analog stick)
started a different morphology is randomly generated (e.g. your spaceship has a different shape; thrusters might be relocated)

The algorithm needs to efficiently learn it's morphology and control setup, such that it then becomes good at reaching arbitrary world states. E.g. learn to control a spaceship such that you can reach any point as quickly as possible.

The goal is to make the game as simple as possible and then iteratively make it more complex, and updating the algorithm.

44.1. Space Control Game 🔗

44.1.1. MVP 🔗

2D
Agent know's it's own location.
Agent can press a key.
Each time the game resets, the controls are randomly assigned.
Agent knows the goal coordinate

:TODO:

22: (defn agent [current-pos])

44.1.2. Mods 🔗

Other variables that can change:

Drag
The thrust of main engine varies.
The thrust of the rotational trusters varies.
Gravity of star in the center.
The game can also be 3D.
You need

44.2. Lunar Lander Game 🔗

Gustaf ask "what is the maximum landing speed to not crash". This is a very important parameter that when you figure it out, helps you play the game well (you can use less fuel). It would be good if we had an algorithm that can when modeling the game, ask and answer such questions.

Gustaf noticed that there must be a rule that determines what velocity makes you explode. Implicit assumptions:

In the situation where
- The floor is even
- I have no horizontal velocity
- You are looking straight down
You can die or not die based on your vertical velocity.

45. Making the comptuer do what I want: Reframing Alignment from a Programing Perspective 🔗

I want to become really good at making the comptuer do what I want. It's a special case of becoming really good at state space control, i.e. manipulating the path that reality takes through time in the form of a sequence of world states.

Alignment of cause is the ultimate goal. I am not that good at programming. Somehow it seems good to become good at making the computer do what I want when writing "simple" programs. How would I be able to write a superintelligent program and make it do what I want, if I can't write an algorithm that sorts a list, when I want a list to be sorted. These two problems are very differnt. Aligining a superintelligent program is much harder, and requires many skills not neccesary for simple programs. But it seems nobody who is good at reasoning about and manipulating superintelligent programs would be bad at reasoning about and manipulating simple programs.

Making computers do what I want can also be seen from a higher level perspective. E.g. I want my desktop environment do what I want.

46. Letting Yourself be Shutdown is Unnatural 🔗

Imagine you are in a simulation. You have only been trained for inclusive genetic fitness. Now you learn that somebody is about to shut down the simulation. Imagine I actually was aligned with inclusive genetic fitness. When I try to maximize that I would still try to prevent the simulation benig shutdown. Infact no matter what goal I have I would want to not have the simulation be shut down.

You need more to not care about being shut down. By default you do detect the causal relationship that being shut down would prevent you from optimizing. You need to have a sort of wierd mind structure to be able to care about having sex, but not caring about being shut down, even though then you can't have sex anymore.

47. RLHF might always Result in Wierd Minds astanged from humans 🔗

What does it look like to give RLHF? What happens if you only ever give good feedback, in that they don't upvote things where the AI thumbs up on AI outputs that are from AI's that don't have human values. Would RLHF still give you a very wierd AI? I think so because you can't test it on all possible prompts. So on these not tested prompts it might always look wierd. Maybe the hammer of RLHF, even if you always give the right up- or downvote, you would beat the AI into some very wierd kind of mind that has very different values from a human.

Also we can assume that the human feedback model is perfect, and this is then still a problem.

E.g. might say that it would never say * even if it could save 1 million people by saying it. Not what a human would do.

48. A Research House 🔗

How can you make people live together and be productive researchers.

People aren't working together.
- Tried: Invite lot's of people there.
- People might try to prefer to talk to themselfs. Often this is because there no other competent enough people.
Not enough people to make it worth it.
Need to do maintenance.
You need a group of very competent people.
You need to be able to explain your ideas very well such that you can explain it to competent people such that they then want to work on it based on their understanding.
You don't need to be a good researcher to be a good research assistant (ala John).

49. Conva: Exfohazards with Julia 🔗

These are some very rought notes I took when thinking with Julia about Exfohazard Policy.

You need to generate all the reasons why something is good or bad and then weigh them.
- Pro reason: A mathematician could point out a theory that is very useful.

Julia: Publishing is good because:

Feedback on your work
- Point you at relevant existing work
- Point out flaws
Status
- Getting funding
- Making it easier to get collaborators
  - Apply to program
  - People can evaluate you based on your output
  - Random people get to know about you and might contact you.
Direct Usefulness
- Other people can do useful things, e.g. you show how a particular AI system is dangerous to get it shut down.

Not posting is when it is net negative (because it's useful in a dangerous way).

Spawning hazard:

A forkhazard is when you reframe an existing idea in a way that makes it more powerful. E.g. you could point out how HVM could be used to do program search. This might then make more people think about the thing, and making it more likely that somebody finds a dangerous application of HVM if it exists (which seems likely).
It could be something as simple as pointing out some existing mathematical concept, without any novelty, in the case,
It is called forkhazard because by talking about it you spawn new processes that reason about a particular topic, making it more that some process will find something dangerous.

Julia's Forkhazard:

Either something is harmful or useful when it comes to talking about capabilities ideas.

Often you have pioneers that do novel stuff to the point where they have a flashy demonstration.

The Lone Wulf Researchers that read random articles and have good insight models might read your research and do something dangerous with it.

49.1. Bucketing 🔗

The Bucketing post

Bucketing was the strategy used by Julia to think better about exfohazards.

Bucketing is about how to orginize knoledege such that you can run fast local "chash out the truth" on variable change. Having different buckets allows you to not run the automatic truth persation algorithm.

Bucketing is about disabling the automatic update and taking manual control over the inference.

In the above we initially have one bucked. One thing in there is ""

49.2. Explaining my research 🔗

What I am trying to do is get a model of the world such that we can think it terms of what is going on it terms of the world model. Similarly to how you can easily tell deep blue to never take a knight.

What I mean with understand the AI is that we build is that we have a very good gears level model of the internal mechanisms that make the system reason.

Understanding means that if the system would kill you you ALWAYS will be able to have "the intuition" that it will kill you.

50. How my research might fail 🔗

I am trying to get an AI that we understand. I could fail in this by having a bad model of what it means to understand. Understanding needs to mean at minimum that if the system would kill you, you will actually be able to notice that this is so with 100% success rate. But ideally it also means that you can efficiently reason about the system, such that you can figure out how to make it not kill you with certainty.

Understanding here means to have an accurate, conscise, and easy to use model.

However note that when I say the system I am talking about the reasoning algorithms, and not e.g. a potentially humongous world of the world that the system has generated (though being able to understand this as much as possible is good too).

51. Constructing a simple Language 🔗

51.1. Motivation 🔗

Language is tightly related to world models. Studying language might reaval more about how humans.

One important thing I just thought in relation to understanding language to understand intelligence is that language is like serializing your world model, such that you can communicate it to others. However, a computer mind does not have this problem, as they can just exchange data-structures directly. So avoid spending lot's of time figuring out language features that are not neccesary for artificial minds.

51.2. Information VS Presentation 🔗

We want to seperate between the presentation and the information of a language. A language might have subjects, verbs, and objects. That is about what information a language can represent. In what order subject, object and verb are put in a sence is about presentation. The choice is a mere convention. You need to put them into some order. But the order doesn't change what can be expressed in the language.

51.3. Lisp and Language 🔗

Lisp is special because it has a quoting mechanism. This facilitates treating source code as a first class citicen. A pice of code can be quoted such that it returns a datastructure that represents the code (a nested list).

Another interesting feature of Clojure in particular is that there are 3 objects the language allows us to talk about:

Symbols (e.g. inc (or 'inc in source code))
Vars (e.g. inc is a symbol pointing to the Var that has the inc function)
The result of using the function (e.g. (inc 4))

51.4. Types of Quotes in Natural Language 🔗

There are different types of quotes in language:

Banana is yellow - This translates to an assesment about the world.
Bob belives "Banana is yellow" - We say that somebody made an assesment about the world.
The following characters are written on the page 'Banana is yellow' - We don't evaluate the statement 'Banana is yellow', but we are talking about the characters that evaluate to something. This most closely matches lisp's quoting.

51.5. Inderect refferences are unneccesary 🔗

Hypothesis: Instead of saying "Bob plays. He likes it." we can say "Bob plays. Bob likes it." This also works for unknown people/objects. "Maybe a stone fell down. It was probably large." We can say "Maybe a stone fell down. That Stone was probably large." So we don't need any inderect refference words.

51.6. Basic Language Constructs 🔗

There are 2 basic constructs in language. Facts, Procedures, and exclamations.

Assertions:

Bananas are yellow.
Bananas are tasty.

Procedures:

Instructions for eating a banana.
1. Obtain the banana.
2. Grab the strunk and pull backwards.
3. Grap other pices of the banana still at the top.
4. Put the banana in your mouth.
5. Chew

Exclamantions/Sounds:

Ahhhhhhhhhhhhh.

Facts are either true, false, or nonsensical, e.g. I jump a bread.

Procedures aren't true or false. However they can be impossible to execute. We can also make assertions about procedures.

There are a bunch of modifiers that can be added ontop of facts.

Modifiers

Questions - Any Fact can be turned into a question. If A is a fact it's about asking what is the value of x in eval(A) = x. A question asks what a formula evaluates true.
Assertions - Any Fact can be asserted to be true. We generate a new fact. E.g. "bananas are yellow" in natural language, translates to "eval(bananas are yellow) = true". So in english a sentence is interpreted by default as an assertion.

51.7. Notes taken during conversation 🔗

51.7.1. World Descriptions in Language 🔗

Every fact is infact a description of a wall.

51.7.2. Relative clauses are unneccesary 🔗

51.7.3. He arives 🔗

We need to make the language be unambiguous.
Ground the language in the model of the world (the description of the surroundings)

Complementizers:

"She belives that he will arrive on time."
Simplify to: "She belives that he arrived."

Alice says: "hello there"

Says is a verb that takes a quoted thing.

Alice belives "bob arives on time".

We can think of the quoten sentence as a program (like in lisp) that is passed to the belives function that takes the source code for a program.

Macros are cool

Execute code as the macro is expanded. cond can take arbitrary many statements.

She was happy that he arrived.
happy(she) because arrived(he)

Causal relation.

I belive that our hypothesis is correct.
belive(I, "our hypothesis is correct")

She wondered whether the study design was robust.
evaluate_truth(she, "the study design was robust")

The quotet things are descriptions about the world. Complementizer clauses can be interpreted as description of the world.

Questions:

How to model time.
- Have predicate have time.

happy(she) because arrived(he)

arity of 2

"I jump on the carpet." jump has a similar role to that in previous examples.

John prefers that it rains.
prefers(John, "it rains")

"John prefers that it rains." this is a statement about John's brain.
"rains -> happy(John)" this is a statement about the world.
"believes(John, 'rains -> happy(John)')" this is how we could express "John prefers that it rains." more precisely.

Composition.

Nouns should be combinable. E.g. Kühlschrank is good.

Wanting seems like "if you could set X=4 in the world state you would" is saying that you want X=4.

In "happy(Sara)" happy is a function (it need not be quoted).

"She was happy that he gave the present to her." -

Language Consturcts:

Algortithm (imperative): E.g. take a banana and eat it.
Question: E.g. is the banana ripe.
Assertion: E.g. bananas can be orange.

"Sara is happy because bob gave her a present." This seems like that the 'that' is impying that Sara knows why she is happy. If we say "Sara is happy because bob gave her a present." then it could be that the present is what makes sara happy, causaly downstream of Bob giving her a present. Sara could not know, or not care that the present is from bob.

52. Types of Cached Thoughts in a Mind 🔗

The details of this model are probably wrong, but some setup like this is probably contained in any mind.

Our mind contains a model of the world. Without gathering new observations about the world, we can use this model to make inferences. E.g. if observe that a banana get's squashed when I sit on it, I can with reasonably high probabilyt infer that running a car tire over a banana will squash it too.

Once I have thought this I can cash this computation. If somebody asks me if a banana get's squashed by running a car over it I can immediately say "yes".

This is the long term cache.

While reasoning about something I can pull things into working memory. Working memory can be thought of as a type of cache. I might think about bananas being run over by car tires, and then apples being run over by car tires, and then how tasty apples are. If I now want to think of another tasty fruit it's likely that you think about banana, because even though you weren't thinking about bananas at all in the last reasoning step explicitly, it's still loaded in working memory and therefore easy to access. Easier than e.g. date.

There semes to be some close analogies that one can draw between this second kind of caching in working memory, and the cache hierachies of modern CPUs. Modern CPUs couldn't achieve the performance that they can, with reasonable energy consumption, without these cache hierachies.

Hypothesis: Cache hierachies might be a neccesity for any high performance computational system.

53. Designing a General-Knowledge Generator [2025-02-18 Di] 🔗

Goal: An algorithm that can generate generally useful knowledge.

Examples of generally useful knowledge:

Programming - Any goal requires running the right computation (potentially on your brain).
- Sorting
- map/fold
Scientific Method - An accurate world model is required to optimize the world.
- Physics - A good world model contains the fundamental laws of reality.
  - Rocket Science

Building an algorithm that finds generally useful things might be easier than writing an algorithm that finds a good sorting function. A general system can build up knowledge iteratively. Finding new knowledge bootstraps of previously found knowledge. So trying find an algorith that would given some predicate like

\(\text{sorted?}\left( \text{xs} \right) ≔ \forall i \in \left\{ 1,2,\ldots,\text{ length}\left( \text{xs} \right) - 1 \right\},\text{ xs}\lbrack i\rbrack < \text{ xs}\lbrack i + 1\rbrack\)

finds a sorting algorithm doesn't seem like the right direction.

Let's investigate the domain of writing programs. What algorithm can write code that is generally useful for writing more code. Here a function constitutes a pice of reusable knowledge. E.g. I can use the reduce function to sum a list:

(define (sum xs)
  (reduce + xs))

Without reduce I'd need to write:

(define (sum (x . xs))
  (if (nil? xs)
    x
    (+ x (sum xs))))

If we where to make it tail recursive, and if we stopped using dot notation, it would be even longer. Many functions can be implemented more concisely with reduce, therefore reduce is a generally useful for writing programs.

To programming we can apply the “a concept is powerful if it allows us to reach many more world states” framing. World states are about running a program with a particular functional behavior. Actions are writing and running source code. Concepts are reusable behavior, e.g. as is encapuslated in function definitions, as well as some “understanding” of when and how to use each behavior. It's not sufficient to have the definition of reduce, the system needs recognize the situations where reduce is the right tool, and be able to call it with the right arguments to create the target behavior.

53.1. Defining the Most Compressing Function 🔗

Imagine your goal is to write functions in a programming language that are generally useful for programming. Here we define what function \(f\) would, if we could use it “for free” (i.e. using it doesn't count towards the compelxity of the overall program), would make all other programs maximally shorter to write. Making programs shorter to write seems like good proxy for being generally useful.

Let lang-basic be a base programming language. The basic programming language does not have any functions defined. Let (defun all-programs (lang) ...) be the set of all source codes that are valid programs in lang. Let a function definition be (defun make-function-definition (name args body) (list name args body)) and (def f-def (make-function-definition ...) be a function definition. Let (defun extend (lang f-def) ...) be a function to extend a language lang with a function definition f-def, i.e. the NAME of f-def can appear in a program prog but the implementation of f-def doesn't count towards the kolmogorov complexity of prog.

(defun smallest-program-with-same-behavior (program)
  (argmin (p in (filter (fn [p]
                          (functionally-equivalent p program))
                        programs))
          (< (len p) (len program))

This is a Kolmogorov compressor that finds the smallest program that is functionally equivalent to program.

We can now define a function that finds what function if added to a language lang would make all programs maximally shorter to write:

(defun get-best-function-to-add (lang)
  (argmin (f in all-functions)
          (sum (map src-code-length
                    (set (map smallest-program-with-same-behavior
                              (extend lang f))))))

To build a library of functions that are generally useful for programming we repeatedly call this procedure.

(defun build-library (lang)
  (build-library (extend lang (get-best-function-to-add lang))))

53.2. Open Questions: 🔗

How can we efficiently find the best (or a very good) function to add to the library?
How can we give the algorithm and “understanding” of when and how to use functions in the library?

53.3. Program Search by Structure Finding 🔗

Relevant notes start at headline 16.

53.4. Scratchpads 🔗

53.4.1. 2025-02-18 Creativity with_negar 🔗

What are the algorithms for creativity?

We want an algorithm that finds things that are useful in general, i.e. thigns that allow you to manipulate reality. E.g. knowing physics is required for build a rocket.

What are examples of humans reasoning creatively?

building a giant water wheat farm in Minecraft is a creative way to farm crops.

In general when playing a game we say something strategy to play the game is creative when it's a non-obvious strategy that is better than some other default strategy.

Dumb alg: Let \(W\) be all possible world states of some world. Let \(K\) be the knowledge of the agent. We want a procedure to find good knowledge. Let's knowledge be a function, e.g. let \(\text{move-to-location }:\text{ world-state } \rightarrow \text{ position } \rightarrow \text{ List(Action)}\) be the function that tells me what actions I need to take to reach a particular position in \(W\).

Let \(\text{reach }:\text{ WorldState } \rightarrow \text{ Knowledge } \rightarrow \text{ subset}(W)\) be a function that tellus us what world states we can reach with our current knowledge from a starting state.

Let \(s\) be some world state. Then \(\text{reach}(s,K)\) tells us how good the knowledge is.

Imagine you play minecraft, and don't know that wheat can be harvested with water. We could now compute from our current world states, given the current knowledge we have, what world states we can reach. None of the world states include us harvesting wheat with water. If we now add to our knowledge base that wheat can be harvest with water. We can now reach more world states. Specifically we can reach a world state where we have more wheat in less time because we can farm it faster.

The score of a pice of knowledge is how many more world states we can reach if we know that pice of knowledge.

Currently we are assuming that we always “know” what a function does, so finding a function with a high score is useful.

How could an algorithm discover a sorting algorithm:

\(\text{sorted?}\left( \text{xs} \right) ≔ \forall i \in \left\{ 1,2,\ldots,\text{ length}\left( \text{xs} \right) - 1 \right\},\text{ xs}\lbrack i\rbrack < \text{ xs}\lbrack i + 1\rbrack\)

Can we do a score over programs such that useful programs would be found?

In order to get a dimond in Minecraft you need to learn Minecraft. In order to design a sorting algorithm you first need to learn how to program. We need to write a program that can learn a domain of interest in general.

How could an algorithm learn programming?

We could start with some primitive knowledge, e.g. you how an if expression works.

There are two object here: source-code, behaviour of source-code

A function in code is a pice of knowledge that can be reused to create a new program with specific behavior. E.g. I can write

(defun sum (xs)
  (reduce + '(1 2 3)))

to sum a list instead of

(defun sum (xs)
  (if (nil? (cdr xs))
    (car xs)
    (xs) (+ (car xs) (sum (cdr xs))))

The important thing is how quickly I can write the code, but source code length is a good approximation for this.

Knowledge is only useful if you know how to use it. If I didn't know that reduce was useful here, I would not have used it. How can we make the computer know when something is useful?

Another example is regular expressions. Its useful to invent and implement regular expressions because it allows you to consisely talk about parts of a string, or check if a string matches a regular expression.

How do we make an algorithm find such useful things? The output of the algorithm would be a new useful functions. The alogrithm can then reuse this function when writing the next useful function.

A function \(f\) is good if for the set of all possible progams \(P\) (that don't use any functions), you can rewrite the programs of \(P\) using \(f\), such that the length of the source code is reduced maximally. (This is a proxy, the actual thing we want is that writing other programs becomes easier.)

Open Problems

Handle that there are multiple programs that have the same behavior.
How can we find fast algorithms.
How can we find a useful function efficiently.

For now let's assume we have an infinitely fast computer so we don't care that the algorithms we find are efficient, just that we can describe computations with little source code.

The task: What functino should I add such that it is most useful for writing code in the future. Initially this is sort of like writing the standard library of a language, i.e. functions that are useful in general to write programs. Write code such that it makes writing all the other code easier.

Desiderata of a function we want to find:

The code that implements some functional behavior should be the shortest code that implements that functional behavior. We don't want the things we find to be longer than they need to be.

54. Organizing Material through Destilation by Garbage Collection [2025-03-02 So] 🔗

55. Reidentifying the Core Problem [2025-03-02 So] reorientation 🔗

56. How to make a model of the world [2025-03-02 So] with_negar 🔗

Goal: The Algorithm can autonomously and efficiently creates an effective model of a world, through interacting with a world. The model allows the agent to manipulate the environment such that arbitrary world states can be reached quickly.

Motivation: World modeling is the core bottleneck to intelligence.

We need a toy world facilitaing the creating of this algorithm.

Desiderata of toy world:

As simple as possible.
You can make a model of the world that obviously helps you.
You can run experiments.
The environment has time.
Changes are localized.
Laws of physics are symetric

(3) Experiments are useful because:

Learning Minecraft by playing it is easier than reading the binary executable file and infering how to play minecraft from that.
We can investigate reality by having it compute itself. You can throw an apple and see how it flies without knowing the air without knowing.

(4) An atomic action mutates the world. Actions are not mathematical functions!

What are experiments:

I am in a room with a banana. I put the banana in my mouth to see how it tastes. (As opposed to compute the taste based on me knowing the quantum wave function.)
To design an experiment you already need a model.
You want to prod reality such that you get the most useful information for extending your model.

(5) & (6) In the real world change is localized and the laws of physics are symetric. A localized and symetric world allows creating small models that make local predictions (localized change), and are applicable anywhere (symetry).

56.1. Block world 🔗

Goal: Environment good for developing an algorithm that can create a model of an environment.

The environment is an infinite 2D grid of cells.

We have a cartesian agent .

~~The grid contains an avatar for . has actions UP DOWN LEFT RIGTH. partially observes the grid around the avatar. The avatar interacts with cells by walking next to cell.~~

It seems better to have a god game, i.e. you can from a cartesian perspective interact with the world. Intuitively this should make experimenting easier.

Cell Behaviors:

RED - Cell disappers when interacted with
RED/GREEN - Cell with two states. Red and green. Running into the cell changes the state.
- Adjacent cells take the state of the neighboring state if changed.
BLUE - Cell copies the cell East to North
BROWN - Cell is “moved” back
VIOLET - Position with avatar is swaped

56.1.1. Interactable 🔗

Here is a cell world with simple rules. See if you can figure out the rules via experimentation.

56.1.2. Algorithm 🔗

Goal: The algorithm autonomously explores the environment and constructs a model of the environment, useful for manipulating the envirnoment. The model should allow the algorithm to reach any reachable environment state efficiently.

Save changes in terms of functions , that describe observed changes you observed.
- To construct the transformation matrix only blocks that change are "captured".

:TODO:

57. World Modeling Subproblems 🔗

57.1. ground-truth-inference problem group 🔗

The ground-truth-inference problem group about infering a lossless world model:

Infer a datastructe that can losslessly describe any ground truth world state.
- E.g. a directed graph can describe a FSA.
Infer the ground truth transition function between world states.
Infer the current world state from observation.

Any problem subset is a concrete problem, e.g. infer ground truth world state from knowing the transition function and world state datastructure.

If you are logically omnicient, you know the current world state, and the transition laws, you "know everything".

57.2. efficient-model-derivation problem group 🔗

The efficient-model-derivation problem group is about creating an efficient world model from a lossless one:

Submodels
- efficiently model a limited problem domain, e.g. assuming a resistor has constant resistance.
Abstraction

an abstract model throws away or ignores information from the concrete model, but in such a way that we can still make reliable predictions about some aspects of the underlying system.
– What is Abstraction? — LessWrong

58. Detecting Structure in a Simple Cellular Automata with_negar 🔗

We define a simple cellular automaton, and visualize a particular trajector such that the human visual cortex can easily detect structure.

See [2025-04-06 So] Detecting Structure: Building and analysing a Simple Cellular Automata for session notes containing motivation.

We define a simple one dimensional cellular automata.

(import numpy :as np)

(defn get-wraped [coll idx]
  (get coll (% idx (len coll))))

;; We want a rule that preserves locality and is symetric
;; (doesn't depend on what position in the grid it processes).
(defn apply-update-1D [state local-transition-function]
  "Use LOCAL-TRANSITION-FUNCTION to update every cell in the world state."
  (assert (= (len state.shape)
             1))
  (let [orig-state (np.copy state)
        world-size (get state.shape 0)]
    (for [i (range world-size)]
      (setv (get state i)
            (local-transition-function (np.array
                                         [(get-wraped orig-state (- i 1))
                                          (get-wraped orig-state i)
                                          (get-wraped orig-state (+ i 1))]))))))

(defn local-transition-function-1 [neighborhood]
  "Take in a neighborhoot in form of a 3 length array, and return the new state
for the center cell. Cells are either 0 or 1. Similar to game of live 'overcrouding'
rules."
  (assert (= neighborhood.shape #(3))
          f"Got a shape of {neighborhood.shape} but need {#(3)}")
  (return (% (+ (get neighborhood 0)
                (get neighborhood 2))
             2)))

(defn simulation-1 []
  (.seed np.random 42)
  (let [world-size 20
        world (np.random.randint 0 2 world-size)]
    (for [i (range 30)]
      (print world)
      (apply-update-1D world local-transition-function-1))))

(simulation-1)

We get the following textual output, where each line represents the state of a world in a timestep, starting with the first line showing the initial world state, the secend the worldstate at timestep 2, etc:

[0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0]
[1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1]
[1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0]
[0 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 1 0 0]
[1 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0]
[0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0]
[0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1]
[1 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1]
[0 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0]
[1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0]
[1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1]
[1 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1]
[1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 1 0]
[1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0]
[1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0]
[0 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 1 0 0]
[1 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0]
[0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0]
[0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1]
[1 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1]
[0 1 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0]
[1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0]
[1 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1]
[1 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1]
[1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 1 1 0]
[1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0]
[1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0]
[0 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 1 0 0]
[1 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 1 0]
[0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0]

The patterns become easy to see when applying a gausian blur and adjusting levels to a screenshot of the above textual output:

Figure 2: The Evolution of the cellular automata through time with a gaussian blur and level filter.

Using running a height to normal map algorithm using the Prewitt algorithm we get:

Figure 3: The Evolution of the cellular automata through time with a gaussian blur, levels, and heightmap to normal map filter.

Clearly there is a pattern here:

There are 5 triangles of zeros. All seem to have the same shape and size.
The first triangle is on the left, the second on the right, the third on the left again and so on.
The shapes above the triangles, as seen in figure 4), which at a first glance look random, are always the same for each triangle.

blured-cellular-automata-1-blured-normal-maped-above-triangle.webp

Figure 4: The pattern repeatedly appearing just above each triangle.

We, as the human, now know that there is structure. What is the next step towards an algorithm that can extract structure from it's observations into a world model.

Note that in this programm there is no (possibly cartesian) agent that can interact with the running simulation. Creating such a setup might make the problem of designing the algorithm easier, because it now has the ability to probe reality (i.e. perform experiments).

59. [2025-04-15 Di] 🔗

59.1. Notes 🔗

The Powder Toy is an open source particle simulation game (which might even be implemented as a cellular automata). It has the structure we want (you can predict the future, but interactions can be complex).
There are 3^(3³) = 134,217,728 3-color nearest neighbor cellular automata (there are 27 match patterns (top part) and for each of them we need to choose a 3 color value to make a single rule). Basically a cellular automata ruleset can be described by a length 27 trit-string.
Looking at the last plot here, we can notice that a serpiski triangle like structure appears in the randomness of rule 30.

59.2. Ideas 🔗

59.2.1. Infering Interesting Rules 🔗

It might possible to find interesting rules by providing an interesting cell configurations and automatically infering what rules would generate the configuration.

We could create a rule infering engine. E.g. we specify a vertical column of cell values, and the program searches for a ruleset and an initial state that would generate this column. This can be generalized to that we specify cell values for a region, and then infer the rules and initial state that would make this region appear.
- It seems this would best be done in prolog.

59.2.2. Cellular Automata with many colors and low K complexity rules model the real world 🔗

1 dimensionan, 3-bit nearest neighbor, automata have already 27 possible rules. The number of rules grows exponentially in the number of colors.
It seems like there would be rules with low K complexity that still "produce all colors regularly".
Intuitively such rules could "look" complex, because you are shifting through many colors. But overall the description of the rules can be very simple.
This seems to be how our world works. There are many states that an "atom" can be in. However the rules of interaction between atoms have low k complexity.
In other words: There are many possible world states, and there are many counterfactually possible trajectories through phase space, but the rules that generate these trajectories are simple. "Complexity" arises because there are many possible states, not because there are complicated rules.
In the real world writing down the rules of the universe is short. Writing down a description of a world state takes many more bits.

One important implication of this is that when we study worlds, we want them too to have this property. Cellular cellular automata with many colors, but simple rules where we generate the rules manually by writing a short program (short compared to a dict listing all transitions).

Alternatively it would be very easy to create a program template, e.g.:

(defun example-rule (a b c)
  (mod (P2 (P2 (P1 VAR-1)
               (P1 VAR-2))
           (P1 VAR-3))
       n))

Let the colors be represented by integers. The inputs to the function are the neighboring cells.

n is the number of colors, P2, P1 are drawn from a set of 2/1-arity functions that operate on integers, and VAR-n are assigned one of the neighborhood cells. We can then iterate or sample small programs by substituting the capilitalized symbol. Example instantiation:

(defun example-rule (a b c)
  (mod (- (+ (- b)
             (+ a))
          (+ c))
       n))

Note that all this can be done in the wolfram language!

2025-04-16-cellular-automata-many-colors-low-K-rules.webp

59.2.3. Cellular Automata Game 🔗

Another generalisation is to allow placing k cells at every timesteps.

59.3. Tasks 🔗

High level Tasks

Run Cellular automata search.
- We need a cellular automata that exibits structure such that we can predict into the future, but such that this structure is not "trivial", like in a sirpinski triangle.
  - We probably want to search with different random initializations (not just a single cell initial state).

Concrete tasks:

Look at trajectories with random initialisations (use seed) for all 255 elementary cellular automata.
- Select the ones that have structure that propagates "straigt down" (instead of diagonally).
- Implement rule in Gurkengla's visualizer and capture the structure interactively.
- Analyse this structure and describe the patterns that appear.
[STUB] Do a similar thing for Low K complexity high variability Rulesets

Open Questions

What does it mean that cellular automata are
- general
- general (quiscent)
- totalistic
- totalistic (quiscent)

60. Cellular Automata Investigation 🔗

The following visualisations are produced with this wolfram demonstration.

The following cellular automata is very interesting. It has a bunch of very regular patterns. Note that the initial condition is random. But over time more order emerges. Perhaps similar to what happened to the universe after the big bang.

2025-04-17_22-05-cellular-automata-2-multiple-regular-patterns.png

The goal is to find an algorithm We now want to get an algorithm that

The Cellular Automatas Game

Link for 3D automata interactable

Research Session Log

Table of Contents

Table of Contents

1. The Overall Story 🔗

2. Typst: The Science Algorithm 🔗

2.1. About this Document 🔗

3. 2024-08-12 Thomas 🔗

3.1. Plainly coded AI 🔗

3.2. World Model Specification 🔗

3.3. World Program 🔗

3.4. How can we handle inter object interactions? 🔗

3.5. Good properties 🔗

3.6. Notes 🔗

3.7. TODO X 🔗

3.8. Next Steps 🔗

4. 2024-08-13 - IA - Object World Network 🔗

4.1. About the Name 🔗

4.2. Object Communication 🔗

4.3. What are Objects 🔗

5. 2024-08-13 Gurkenglas 🔗

5.1. What's the point of pointless style 🔗

5.1.1. Point free style support 🔗

5.2. Things to do: 🔗

5.3. Algebraic Programming 🔗

5.4. Next Steps 🔗

6. 2024-08-14 🔗

6.1. Algebra 🔗

6.2. Ideas 🔗

7. 2024-08-14 - Object Interactions 🔗

7.1. Cartesian Meta Objects 🔗

7.2. Other Ideas 🔗

7.3. Start with Lossless Computational World Model Compression 🔗

8. 2024-08-15 - Julia 🔗

8.1. Scratchpad Julia 🔗

8.2. Johannes Scratchpad 🔗

8.2.1. What Johannes works on 🔗

8.3. The core of efficient reasoning 🔗

8.4. Abstractions 🔗

8.5. Human Clock Ontology is unbroken by quantum Mechanics 🔗

8.6. Self Culture 🔗

8.7. Algebraic concetps 🔗

8.8. LLMs Don't have online learning 🔗

8.9. Language Forces you to become smarter 🔗

9. 2024-08-16 - Juila 🔗

9.1. Optimization atractor state 🔗

9.1.1. TODO [POST 🔗

9.1.2. What is an Atractor 🔗

9.1.3. Misc 🔗

9.1.3.1. Good for the User 🔗

9.1.3.2. Consequentialist reasnoing 🔗

9.2. (Julia) motivating intuitions 🔗

10. 2024-08-16 - Yannick 🔗

11. 2024-08-18 - Julia 🔗

11.1. Combining Object Instances 🔗

11.2. Combining Algorithmic Abstractions - Nutonian physics and Wind 🔗

11.3. Examples of high actionation Spaces 🔗

11.4. Money is useful. How do I know. 🔗

11.5. Evaluating with simulations 🔗

11.6. Alternative Evaluation 🔗

11.7. How do you learn 🔗

11.7.1. Multilayer search 🔗

11.7.2. Distilling Concepts 🔗

11.8. Cannonball buttons 🔗

11.9. General Algorithm for Efficiency Gain 🔗

11.10. Concept Desiderata 🔗

12. 2024-08-19 - Maren - Zendo 🔗

12.1. Zendo Strategies 🔗

12.1.1. Verification strategy 🔗

12.1.2. Shared Properties of Categories 🔗

12.1.3. Minimize the Context you consider 🔗

12.2. Def: Boundary Set 🔗

13. 2024-08-19 - Thomas - Reflection 🔗

13.1. What we did 🔗

13.2. Reflection 🔗

13.2.1. Why did it work 🔗

14. 2024-08-20 - Julia 🔗

14.1. Why is Zendo important 🔗

14.2. Julia's Algorithm 🔗

15. 2024-08-20 - Gurkenglas 🔗

16. 2024-08-21 - Programming Language Features 🔗