Superintelligence: Paths, Dangers, Strategies

Is the default outcome doom?

Bostrom starts off this Chapter by tackling the question of whether humanity has any hope in a scenario where a superintelligent system has a decisive strategic advantage.

Existential catastrophe as the default outcome of an intelligence explosion?

Once the superintelligence agent manages to attain a decisive strategic advantage and create a singleton, its further actions and operations rely on the motivations of the agent.

Accordingly, a superintelligent system with final goals of making paperclips, for instance, it is not unreasonable to suggest that even this type of goal may breach human interests. This is because “An agent with such a final goal would have a convergent instrumental reason, in many situations, to acquire an unlimited amount of physical resources and, if possible, to eliminate potential threats to itself and its goal system.”

The treacherous turn

This scenario is one where an AI superintelligent system behaves accordingly so that it does not raise any alarms and once it realises that it does not need to behave in such a way due to the fact that it has a decisive strategic advantage and hence, is strong enough to withstand any human intervention, it then starts behaving in an unfriendly manner.

Malignant failure modes

Perverse instantiation

This is the notion that a superintelligent system may inadvertently pursue its goals while breaching the intentions of the programmers who specified those goals and also violate human interests. Appropriately, Bostrom provides the following example,

“Final goal: ‘Make us smile’

Perverse instantiation: Paralyze human facial musculatures into constant beaming smiles.”

Obviously, programmers may not allow such vague and ambiguous final goals but the author makes his point as to how the superintelligent system may misinterpret the intentions of the programmers.

Infrastructure profusion

This concept is associated by Bostrom with the concept of a junkie whose actions will always be to seek a continuous supply of his drug. For instance, engaging with the example of an AI making paperclips that was pointed out above:

“An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacture of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.”

The obvious response to this claim would be that such a thing could be resolved by the programmer indicating to the AI a specific target to hit, say 1 million paperclips, and then stop. “Yet this may not be what would happen. Unless the AI’s motivation system is of a special kind, or there are additional elements in its final goal that penalize strategies that have excessively wide-ranging impacts on the world, there is no reason for the AI to cease activity upon achieving its goal.”

This is because “the AI, if reasonable, never assigns exactly zero probability to it having failed to achieve its goal; therefore the expected utility of continuing activity (e.g. by counting and recounting the paperclips) is greater than the expected utility of halting. Thus, a malignant infrastructure profusion can result.”

Mind crime

This prospect of mind crime is defined as the possibility of an AI seeking to enhance its context on human psychology and sociology by developing conscious simulations similar to those of a whole brain emulation. Consequently, humane motivations and values may seek to make use of them in a disastrous and evil way.

Productivity Readers

Search This Blog

The Ride of a Lifetime - Entire Book Summary

Superintelligence: Paths, Dangers, Strategies - Chapter 8

Comments

Post a Comment