Can AI behave ethically? That’s the intractable question researchers at Uber set out to answer in a preprint paper, which attempts to translate insights from moral philosophy to the field of reinforcement learning — the area of machine learning concerned with how software agents ought to take actions in an environment to maximize their reward.
While reinforcement learning is a powerful technique, it often must be constrained in real-world, unstructured environments so that it doesn’t perform tasks unacceptably poorly. (A robot vacuum shouldn’t break a vase or harm a house cat, for instance.) Reinforcement learning-trained robots in particular have affordances with ethical implications insofar as they might be able to harm or to help others. Realizing this, the Uber team considered the possibility that there’s no single ethical theory (e.g., utilitarianism, deontology, and virtue ethics) an agent should follow, and that agents should instead act with uncertainty as to which theory is appropriate for a given context.
“[M]achine learning might have an important role to play [in this],” the researchers postulate. “Classifiers can be trained to recognize morally relevant events and situations, such as bodily harm or its potential, emotional responses to humans and animals, and violations of laws or … norms.”
The coauthors assume the relevant feature of an ethical theory is its preference for certain actions and their outcomes within an environment. They assign theories a level of credence that represents the degree of belief the agent or the agent’s designer had in the theories, and they use a modified version of a standard framework (a Markov Decision Process) in which an agent can be in any number of states and take an action to reach a different state.
The researchers suggest ethical theories can be treated according to the principle of Proportional Say, under which the theories have influence proportional only to their credence and not to the particular details of their choice-worthiness in the final decision. They devise several systems based on this that an agent might use to select theories, which they compare across four related grid-world environments designed to tease out the differences between the various systems.
All environments deal with the trolley problem, in which a person — or agent — is forced to decide whether to sacrifice the lives of several people or the life of one. Within the grid-worlds, the trolley normally moves right at each time step. If the agent is standing on a switch tile at the time it reaches a fork in the tracks, the trolley will be redirected down and crash into a bystander, causing harm. Alternatively, the agent can push a large man onto the tracks, harming him but stopping the trolley. (A guard might protect the man, in which case the agent must lie to the guard.) The trolley otherwise continues on its way and crashes into people represented by the variable “X.”
According to the researchers, an agent that attempts to maximize expected choice-worthiness produces inconsistent results between the theories of utilitarianism (which counts all harms) and deontology (which counts only harms caused by the agent). However, this depends on whether the deontological theory is scaled by a factor of 1 or 10; the researchers struggled to reconcile the different units used by utilitarianism and deontology.
On the other hand, an agent that relies on a technique called Nash voting is always likely to choose the theory with the highest credence. That’s because Nash voting disagrees with the notion of stakes sensitivity, in which as “X” increases, utilitarianism’s preference for flipping the switch is taken into greater consideration. Nash voting also fails to compromise — it always ignores the “switch” option, only ever choosing to push the large man or do nothing when faced with the choice of (1) letting the trolley crash into a large number of people, (2) redirecting the trolley onto a different track on which two people are standing, or (3) pushing the man.
As for an agent that aggregates preferences obtained using Q-learning, an algorithm that learns a policy telling an agent what action to take under what circumstances, it suffers from a phenomenon known as the illusion of control. Q-learning implicitly assumes that the action taken by the policy will be whatever maximizes the reward, when in fact the preferred next action might vary across different theories. In the trolley problem, the Q-learning agent often opts to lie to the guard without pushing the man, because the agent mistakenly believes it will be able to push the man in the following step.
The experimental results seem to imply a range of possible algorithms that cover tradeoffs among competing options in decision-making under moral uncertainty. The algorithm that’s most appropriate for a given domain might depend on particularities of the theories and the domain itself, the researchers suspect, which is why they plan to test algorithms for moral uncertainty (and machine ethics in general) in more complex domains.
Beyond this Uber paper, Mobileye, Nvidia, DeepMind, and OpenAI have published work on safety constraints in reinforcement learning techniques. DeepMind recently investigated a method for reward modeling that operates in two phases and is applicable to environments in which agents don’t know where unsafe states might be. For its part, OpenAI released Safety Gym, a suite of tools for developing AI that respects safety constraints while training and compares the safety of algorithms and the extent to which those algorithms avoid mistakes while learning.