Intro to Brain-Like-AGI Safety

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Suppose we someday build an Artificial General Intelligence (AGI) algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely? I argue that this is an open technical problem, and my goal is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems. Chapter 1 has background and motivation; Chapters 2-7 are on neuroscience, arguing for a picture of the brain that combines large-scale learning algorithms (e.g. in the cortex) and specific evolved reflexes (e.g. in the hypothalamus and brainstem); and Chapters 8-15 apply those neuroscience ideas to AGI safety. A major theme is the idea that the brain has something like a reinforcement learning reward function, which says that pain is bad, eating-when-hungry is good, etc. I argue that this reward function is centered around the hypothalamus and brainstem, and that all human desires—even "higher" desires for things like compassion and justice—come directly or indirectly from that innate reward function. If future programmers build brain-like AGI, they will likewise have a reward function slot in their source code, in which they can put whatever they want. If they put the wrong thing, the resulting AGI will wind up callously indifferent to human welfare. How might they avoid that? That's an open technical problem, but I will review some ideas and research directions.

Article activity feed