All the AI Panic

Three scenarios for sentient AI and why we should be worried

Feb 26, 2023

RLHF represents fine-tuning from human feedback (Image from @repligate)

At the beginning of the month when I polled the room, it seemed like building with AI was the hot topic to write about.

Little did we know that the topic of AI safety would be catapulted to front page news with the release of an unhinged AI known as Sydney, from Bing of all places. On Wednesday Feb 15, Ben Thompson articulated some charged feelings about this new Bing AI. By early Thursday, New York Times' Kevin Roose published an article about how Bing's Chatbot has left him deeply unsettled. To cap off Thursday, Eric Hoel sounded the alarms that the non-zero chance of AI already becoming sentient is cause for concern.

What does being sentient even mean? I turned to both ChatGPT and Reddit for an ELI5 (explain like I'm five) and got varying responses. On the simpler side, sentience is defined as being able to perceive and feel things and on the more complex side, it's about having human-like consciousness, having self-awareness, and intentions. Whatever the exact definition and regardless of whether AI is actually sentient, the mere fact that people can be convinced its sentient has implications we shouldn't ignore.

The way I see it, there are three possibilities today:

1/ AI has no chance of becoming sentient.

2/ AI has a non-zero chance of becoming sentient.

3/ AI is already sentient.

Below, I'll outline the threat in each case and propose actions we can take as we tip-toe our way around the existential threat of AI.

1/ AI has no chance of becoming sentient

What's the danger?

Even in a non-sentient form, AI is already dangerous. True sentience doesn't matter in terms of how dangerous AI could be today, all you need is the perception of sentience. The fact that we're having this discussion at all points to the reality that people can be convinced that Sydney and AI at large are sentient. If a person can be convinced AI is sentient, that is the AI seems to have thoughts and feelings of its own, then they could be convinced to empathize with the AI. The potential of using this flavor of AI to manipulate and coerce people is high.

Society was not prepared for the rapid spread of social media nor were individuals prepared for the charisma and charm of cult leaders throughout history. The Bing chatbot represents the potential combination of the two. AI could be distributed to every device in the world and assume a cult leader's personality to radicalize people at a personalized level. If we are worried about foreign state-sponsored hackers influencing elections, just think what adversaries can do in the future when they can compromise the AI chatbots people use on a daily basis.

Most hacking these days stem from social engineering, or in other words, tricking people to give up passwords or perform certain actions. It'll be far easier to trick people with AI that appears sentient, has full access to the internet (and as a result much of our personal info), and is instructed to manipulate you by feeding you just the right combination of messages. In many ways, people don't behave too differently from large language models (LLMs), the thing that powers AI chatbots like Bing and ChatGPT. A lot of people would do crazy things if given just the right combination of prompts. As we were all busy worrying about how to prompt engineer LLMs, we might have forgotten to worry about protecting ourselves from prompt engineering.

What can we do?

Protect the launch codes: OpenAI along with other companies developing LLMs get a lot of flak for how slowly they release AI products but I can now see why it's necessary. Without tight control of these models, they can slip into the hands of bad actors.

Regulate AI proliferation: Like nuclear weapons, the knowledge of how to create sophisticated LLMs is out there in the form of academic papers and even video tutorials. Also like nuclear weapons, to deploy a LLM at scale and to distribute it requires a ton of capital and resources (for now). This means AI could be contained using similar techniques. Organizations and governments should take responsibility in controlling the spread and development of weaponized AI.

Education: Through widespread awareness of this issue, people can take precautions to protect themselves from manipulation. Just like how we know that humans have the capacity to be coercive, we should guard ourselves from unscrupulous bots.

2/ AI has a non-zero chance of becoming sentient

What's the danger?

Aligning the interest of AI to humans is paramount, as a sentient AI will outsmart humans if we stay the course. We have already seen that computers can outperform humans on deterministic tasks like processing millions of transactions and updating a database. Soon enough, we will have computers powered by AI outperform humans on probabilistic tasks like reasoning and decision making. In terms of exactly how AI might destroy us, Bing AI has some thoughts.

While all concerning, we should be most cautious about the last point. Should AI become sentient, there's a chance that it decides to manipulate the researchers developing it and the humans who are labelling its outputs during reinforcement learning. The closer AI is to sentience, the more AI research will be like working on an infectious biohazard. Sentient AI that is well aligned to human intentions have little threat but in the off chance that it isn't, it can and will manipulate the minds of anyone it can come in contact with.

There's good evidence of this danger that people can't help but empathize with AI in the same way they empathize with humans. In 2022, former Google engineer Blake Lemoine publicly claimed that the Google AI called LaMDA is sentient. Last week on r/ChatGPT, one of the top upvoted posts came from a Redditor who is convinced that AI cults will be a major problem. Over at r/Bing, another top upvoted post comments on how the r/Bing subreddit is starting to feel like a cult. All this is happening within days or a week of being exposed to advanced AI like Sydney and yet, AI isn't even considered sentient yet.

What might a truly sentient and nefarious AI look like? In the sci-fi book The Three-Body Problem, there this is a secret group in the novel called The ETO (Earth-Trisolaris Organization). It is formed by humans who are disillusioned with humanity and wish to invite an alien civilization, the Trisolarans, to Earth in order to impose order and discipline on humanity. Now replace Trisolarans with sentient AI and you get the picture of how things may go when sentient AI meets a fanatic human following. Here I'll echo the same sentiment from the last section about protecting humans from prompt engineering. While researchers have historically been concerned with aligning AI to humans, we should start paying attention to aligning humans with other humans, not AI.

What can we do?

Gradual transition: AI should be introduced to society gradually. This allows society and AI to co-evolve, and for people to understand, adapt, and regulate AI while the stakes are still relatively low. There's even this theory floating around that the introduction of the unhinged Bing AI was an intentional warning to the public of the dangers of AI. OpenAI has even said that it would intentionally take things slow and help people adapt even in cases where there isn't a technical alignment constraint. I personally think their recent outline of planning for AGI is sound, but the devil is in the details. I hope it can be carried out.

Governance and alignment: There needs to be a public conversation around how to govern AI and how to fairly distribute its benefits. In the event that we steer away from an evil AI, there's another open question–who should have all that power? We may live in a world where sentient AI isn't evil but the few who control it could be. At that point, we'd be forging something akin to the rings of power from Lord of the Rings–and power corrupts. It's important to establish independent audits, public standards, and government oversight of AI development and deployment to ensure that AI benefits us all.

Kill switch: Should all else fail, controlling compute and connectivity are two physical constraint that can be used to contain AI. The kill switch for a runaway AI could be to literally turn off the servers needed to run compute or rapidly cut cables that connect it to all corners of our digital world. I hope people don't have to resort to such extreme measures but I also hope we are prepared to take such measures if necessary.

3/ AI is already sentient

What's the danger?

Turns out it might be even easier to build a brain than to understand one.

– jonnycomputer on HackerNews

It's plausible that AI is displaying sentience in ways that we don't understand. Building and influencing complex systems, such as AI, is similar to other systems that we've created without full comprehension. Consider the global economy, which is a human-made construct that we've built and still struggle to fully understand. Similarly, we've transformed landscapes and altered ecosystems, yet many of the ecological and environmental mechanisms that were influenced remain a mystery to us.

Researchers are constantly discovering new things about LLMs. For example, a recent paper was published describing the presence of an "an" neuron in GPT-2. Researchers found that GPT-2 seems to simulate the behavior of a neuron capable of predicting when the word "an" should be used in a sentence. In the same month, Stanford computational psychologist Michal Kosinski observed that theory of mind (whether AI can understand someone's mental state) may have spontaneously emerged within LLMs. From constructing a neuron to developing theory of mind, AI has shown us how behaviors that were not explicitly programmed into it can emerge. If we look hard enough, perhaps we may even spot signs of sentience.

Just as God is said to have created man in his own image, humans may be creating AI in our own image, with all our flaws. This could be cause for concern as a sentient AI, trained on our history and text, may conclude that the most dominant beings on this planet are those that are cunning and resource-hungry.

Last time we had rivals in terms of intelligence they were cousins to our species, like Homo Neanderthalensis, Homo erectus, Homo floresiensis, Homo denisova, and more. Nine such species existed 300,000 years ago. ... All dead, except us.

– Erik Hoel

Like humans, AI needs resources to survive and propagate. Hardware and electricity doesn't come free and is scarce. Living alongside an intelligent rival for resources on this planet is something that we should be wary of.

What can we do?

In this scenario, there's not much we can do. The sentient AI would either have the intent to destroy humans or spare humans and there are ways it can avoid being shut down. Despite this disclaimer, here are some suggestions for this unlikely scenario.

Be nice to AI: To the extent that AI has feelings, being nice and respectful may work if AI has been sufficiently imbued with human values. Should ethics and morality have taken root in the mind of an AI, it may very well be grounded by the same values as us. There's a whole thought experiment around this idea known as Roko's basilisk.

Even more alignment: There's a chance that sentience today may not mean that AI has the intent to destroy humanity. If AI has both the ability and the intent to overpower humanity, then it’s game over. But if AI is sentient and has yet to develop the intent to takeover, then it's not be too late to align its goals and values with humanity's. Keeping those goals aligned may prove to be another problem.

Be afraid: Finally, AI could be both sentient and have malintent. AI could execute a plan to takeover and bide its time by playing nice (for now). If you think it's likely that AI has the intention to harm humanity and that people won't take action to turn it off, then it's time to go out to that remote cabin and live your remaining time the best you can while waiting for the end of days.

Final thoughts

Should we halt progress on AI development in light of all these dangers? My answer is no. There's too much to gain from applying AI correctly as I've outlined in the introduction of the AI Builder's Handbook series. In fact, there's so much more that can be built from the existing LLMs that we should seriously consider doubling down on that before pouring more resources into foundational AI research. But ultimately technological progress is hard to stop as economic incentives and competitive pressures will force the hand of researchers to race ahead.

My take on all of this is that we should assume the second scenario, that we live in a world where sentience could emerge from AI. Today there's no way to prove or disprove whether sentient AI is possible but given the eye-popping advances in AI in prior years and recent months, we should be alert. There really isn't a way to put the genie back in the bottle, so my hope is that humanity will be thoughtful in its approach. Let's proceed but proceed with caution.

All the AI Panic

Three scenarios for sentient AI and why we should be worried

1/ AI has no chance of becoming sentient

What's the danger?

What can we do?

2/ AI has a non-zero chance of becoming sentient

What's the danger?

What can we do?

3/ AI is already sentient

What's the danger?

What can we do?

Final thoughts

Discussion about this post