HOUSE_OVERSIGHT_016250.jpg
Extracted Text (OCR)
agent that perceives, and acts in order to maximize, its expected utility. Subfields such as
logical planning, robotics, and natural-language understanding are special cases of the
general paradigm. AI has incorporated probability theory to handle uncertainty, utility
theory to define objectives, and statistical learning to allow machines to adapt to new
circumstances. These developments have created strong connections to other disciplines
that build on similar concepts, including control theory, economics, operations research,
and statistics.
In both the logical-planning and rational-agent views of AI, the machine’s
objective—whether in the form of a goal, a utility function, or a reward function (as in
reinforcement learning)—1s specified exogenously. In Wiener’s words, this is “the
purpose put into the machine.” Indeed, it has been one of the tenets of the field that AI
systems should be general-purpose—i.e., capable of accepting a purpose as input and
then achieving it—rather than special-purpose, with their goal implicit in their design.
For example, a self-driving car should accept a destination as input instead of having one
fixed destination. However, some aspects of the car’s “driving purpose” are fixed, such
as that it shouldn’t hit pedestrians. This is built directly into the car’s steering algorithms
rather than being explicit: No self-driving car in existence today “knows” that pedestrians
prefer not to be run over.
Putting a purpose into a machine which optimizes its behavior according to clearly
defined algorithms seems an admirable approach to ensuring that the machine’s “conduct
will be carried out on principles acceptable to us!” But, as Wiener warns, we need to put
in the right purpose. We might call this the King Midas problem: Midas got exactly what
he asked for—namely, that everything he touched would turn to gold—but too late he
discovered the drawbacks of drinking liquid gold and eating solid gold. The technical
term for putting in the right purpose is value alignment. When it fails, we may
inadvertently imbue machines with objectives counter to our own. Tasked with finding a
cure for cancer as fast as possible, an AI system might elect to use the entire human
population as guinea pigs for its experiments. Asked to de-acidify the oceans, it might
use up all the oxygen in the atmosphere as a side effect. This is a common characteristic
of systems that optimize: Variables not included in the objective may be set to extreme
values to help optimize that objective.
Unfortunately, neither AI nor other disciplines (economics, statistics, control
theory, operations research) built around the optimization of objectives have much to say
about how to identify the purposes “we really desire.” Instead, they assume that
objectives are simply implanted into the machine. AI research, in its present form,
studies the ability to achieve objectives, not the design of those objectives.
Steve Omohundro has pointed to a further difficulty, observing that intelligent
entities must act to preserve their own existence. This tendency has nothing to do with a
self-preservation instinct or any other biological notion; it’s just that an entity cannot
achieve its objectives if it’s dead. According to Omohundro’s argument, a
superintelligent machine that has an off-switch—which some, including Alan Turing
himself, in a 1951 talk on BBC Radio 3, have seen as our potential salvation—will take
steps to disable the switch in some way.! Thus we may face the prospect of
superintelligent machines—their actions by definition unpredictable by us and their
' Omohundro, “The Basic AI Drives,” in Proc. First AGI Conf., 171: “Artificial General Intelligence,” eds.
P. Wang, B. Goertzel, & S. Franklin (IOS press, 2008).
30
HOUSE_OVERSIGHT_016250