1 Introduction

The rapid increase in the usage of artificial intelligence (AI) in critical applications has brought about a need to consider the ethics of how AI is used, and, whether it would make the right choice while encountering ethical dilemmas [1, 2]. By the right choice we mean a choice which is morally praiseworthy within a given context. Although, as humans, we do not have a general agreement on what the right choice is. Rather, within a given society, we want to be nice and fair to one another, but the niceness and fairness manifests differently in different societies and cultures. We are not taking a stand on what the right choices ultimately areFootnote 1. We are simply assuming that in a particular context that we, though not always, can agree on some choice being morally better than others. Hence, we take a similar stance with respect to AI and how it can make the right choice.

The ethics of AI usage has been studied extensively by lawyers, philosophers, and technologists to develop policies to account for the ethical implications of an AI application. However, the development of moral decision-making capability within AI algorithms, based on ethical theories, is still in its infancy; it has been discussed and debated in the last couple of decades [3,4,5], but has resulted in few real-world implementations [6, 7]. This question of how to develop AI based on ethical theories falls under the umbrella of machine ethics, often referred to as artificial morality or AI alignment. The majority of the frameworks discussed in machine ethics are either based on rule-based (deontological), or consequentialist ethical theories [6]. In deontological implementations, an artificial agent abides by a set of rules which dictate its action, regardless of what happens as a result of this action. On the other hand, a consequentialist agent tends to focus on the utility value as a deciding factor of the goodness of an action. While there are advantages from using these theories, they have shortcomings, and we argue how these could be overcome by using virtue ethics.

Virtue ethics centers morality on an individual’s character, an individual who behaves such that he/she exercises virtues to manifest the character of a virtuous person. Generosity, truthfulness and bravery are examples of virtues. Aristotle [8] argued that virtuous person will know how to balance the extremes of these virtues, by striving towards a golden mean. An important advantage is that a virtuous person strives to make better choices when similar situations present themselves in the future. We posit that the trait of life-long learning in virtue ethics makes it compatible with modern day artificial intelligence, where an agent can, in principle, adapt its behavior through continuous machine learning.

By artificial virtuous agents (AVA), we mean AI that exhibit virtue. The virtues AVA exhibit need not be defined the same as the Aristotelean requirement for virtues in humans: to show emotions, possess consciousness, and act with moral agency. Rather, virtues in AI agents are those that can be defined depending on the application. For example, a robot with artificial bravery might be defined as an agent which has the disposition to find the balance between making risky choices and playing it safe: one that has the excellence of finding the golden mean of artificial bravery. Such a virtue might be useful in autonomous search and rescue operations.

Are we underestimating the role of the maker, say, the developers and project managers? No, because we are not trying to create a morally perfect god, or a perfectly artificial general morality, only an AI that behaves as morally well as we do in a context. We often know how to behave in a context (and within a wider culture), and we aim to train an AI to do at least as well as we do at that. That does not mean that it will be morally perfect across the board, just like we are not morally perfect across the board. But it means that it will be trained to be as moral as we can in a context. We would not be surprised if it sometimes, after sufficient training, could make a choice that we after some thought realize is morally better than the one we initially would have made. Now, of course, the maker might make mistakes, and often not know what is the right choice in a context, but that goes for everything we make. We should still try to do as well as we can.

There are several implementations of the dominant ethical theories mentioned above. However, these have been developed by demonstration on toy examples and very specific problems [1]. To expand the conversation and to apply these theories in more general scenarios, we propose to seek inspiration from the world of gaming, in particular role-playing games that compel a player to make ethical choices. Some examples of such games are Witcher 3 [9], Fallout [10], Batman: The Telltale Series [11] and Papers, Please [12]. These video games are usually based on a mechanism where gameplay is dictated by the players’ choices. One such mechanism follows a scripted approach, where the developer handcrafts moral dilemmas based on the storyline of the game [13]. The other mechanism is known as a systemic approach, where there are no specific moral dilemmas for the player to solve, rather, the player performs certain activities repeatedly within the game, but as the story unfolds, the dilemmas become apparent [14]. For example, in Papers, Please, the player is an immigration officer who processes documents of entrants, and decides whether to allow entry into a fictitious country called Arstotzka. Sometimes, spies attempt to enter, claiming to expose the corruption within Arstotzka, and can be illegally let into the country without immediate consequences. However, later in the game, these seemingly harmless decisions play a major role in the fate of Arstotzka, force the player to choose sides, and deciding how the game ends.

With respect to implementation of virtues, previous works [7, 15, 16] have advocated for the reinforcement learning (RL) paradigm because it fits well with virtue ethics, since an agent can learn behaviour from experience. We motivate the use of affinity-based RL where agents can be incentivized to learn virtues by modifying the objective function using policy regularization [17], rather than designing the reward function itself. And since virtue ethics involves performing the right action in the right situation for the right reasons [18], we also highlight the importance of interpretability, especially since we opt for the usage of opaque deep neural networks.

In the subsequent sections, we will discuss state-of-the-art machine ethics, and make the case for AVA as a viable alternative to the dominant theories. Next, we review the literature from role-playing games which integrate aspects from ethics and morality. In particular, we will discuss the game Papers, Please. Finally, we explain how systemic environments in role-playing games can be used to train artificial agents to develop virtues, and, how RL can be leveraged to train such agents.

2 Background and related work

Most of the machine ethics literature [1] refers to artificial agents based on ethical theories as artificial moral agents (AMA). In this section, we introduce artificial morality and argue for the development of artificial virtuous agents (AVA), where an artificial agent reasons in terms of virtues instead of labelling an act as right or wrong. We first talk about the current implementations of AMA, then introduce virtue ethics as an alternative paradigm, and finally make the case for AVA.

2.1 Artificial morality

In machine ethics, the conversation revolves mainly around morality: whether an artificial agent’s choice is right or wrong. If an agent violates certain rules or fails to meet certain standards, it is said to be morally wrong. A famous example of rules for moral agents is Asimov’s Laws, which formulates a set of laws that a robot must never violate. This approach is inspired by deontological ethics [19], where the right actions are chosen based on specific rules regardless of consequences of the action. In contrast, the utilitarians believe that the action with the best consequences for most people over time is the morally right actionFootnote 2; e.g., the action with the maximum pleasure and minimum pain. Typically, they aim for the greatest amount of good for the greatest number. For example, a utilitarian might prioritize the needs of the majority over that of the few through utility maximization. For a computer or an artificial agent, following rules or calculating the best consequence is straightforward; this may be one of the reasons why most of the implementations in machine ethics are based on the deontological and consequentialist ethics [6].

Approaches to machine ethics include top-down, bottom-up and hybrid approaches [3]. As the name suggests, a top-down approach defines a set of rules for an artificial agent to follow. The environment gives no feedback for learning; the rules are presumed to be adequate for ensure an agent’s moral behavior. Bottom-up approaches are preferred, in the sense that they allow for the agent to learn and adapt to new situations, while not having much control over how learning happens. This coincides with the premise of the use of machine learning: it is the preferred system design paradigm when not all future situations can be defined and thus accounted for during the design phase. Lastly, a hybrid approach strives to integrate the strengths of top-down and bottom-up approaches while mitigating their respective weaknesses. See [1] and [6] for reviews on machine ethics implementations based on their approaches.

It is still early days for this field; while there have been several attempts to develop machine ethics systems, the challenges relating to machine ethics have not yet been adequately addressed. The disagreements among scientists and philosophers about ethical artificial intelligence design have not yet been resolved. Therefore, there is no obvious direction for the research to proceed in. Some may go as far to claim that the state-of-the-art AI cannot be ethical, either because artificial agents lack moral agency or because they did not program themselves [21]. Given this current state, we propose virtue ethics as a good bottom-up alternative.

2.2 Virtue ethics

In his classic, The Nicomachean Ethics [8], Aristotle defined virtues as an excellent trait of character that enables a person to perform the right actions in the right situations for the right reasons. A person can behave virtuously in a given situation by asking themselves: “What would a virtuous person do in the same situation?”. Such a person practices virtues by habituation, thus striving towards excellence in character. According to Aristotle, a child or a young person is inexperienced and thus lacks the wisdom to make virtuous decisions. However, with learning experiences from consistent practice of virtues, the youth will exhibit practical wisdom (phronesis).

In virtue ethics, virtues are central and practical wisdom is a must, thus providing a framework to achieve eudaimonia, which translates to flourishing or happiness. Eudaimonia refers to well-being of the individual and the overall society [22]. Unlike a utilitarian, who focuses on achieving the best outcome for the majority, a virtuous person does not practice virtues for the sake of eudaimonia, but virtues and eudaimonia are just two sides of the same coin. Some examples of virtues are honesty, bravery, and temperance. Another feature of a virtue is that there are often no absolute right or wrong actions in a given situation; a virtue is exercised in degree. A virtuous person knows to live by this golden mean, while a non-virtuous person might not find that balance. For example, a brave person would exercise the right amount of bravery required for a situation (golden mean), rather than being absolutely cowardly or reckless. This is unlike deontological ethics, where an action is deemed right or wrong based on its adherence to pre-defined rules.

We propose that virtue ethics is a good ethical theory for machine ethics. For instance, utilitarianism is about maximizing net utility of a given situation. As a result of the utility-oriented approach, an action may favor the majority at the cost of the few. In such situations, a deontologist may vehemently disagree with the utilitarian means to such an end; to deontologists, the end is less important but the means to such ends is vital. The means of such actions based on universal norms are said to be of moral worth. “Always speak the truth” is an example of a deontological norm, where speaking the truth must be the means, regardless of the end. While universal norms may inform moral behaviour, opponents of deontology may point out that we cannot define rules for every single situation; it is practically impossible. A bottom-up approach of learning and improving, may offer a viable alternative paradigm, and this is where virtue ethics will be relevant [23].

Moor [24] distinguished artificial agents into four different levels: ethical impact agents (e.g., ATM machines), implicit ethical agents (e.g., airplane auto-pilot), explicit ethical agents (e.g., ethical knowledge and reasoning), and fully ethical agents (e.g., humans). It seems Moor would place our AVA in the category of implicit ethical agents, but we place it in the category of explicit ethical agents, because we believe it can learn to become moral from experience.

2.3 Related works: artificial intelligence and virtues

Virtue ethics was resurrected in a powerful piece by Elizabeth Anscombe [25] in 1958, where she highlighted the weaknesses in contemporary ethics. Thereafter, philosophers such as Foot [22], MacIntyre [23] and Hursthouse [26] followed suit to develop a modern account of virtue ethics. In parallel, virtue ethics was introduced in the form of teleology (central to Aristotelean ethics) developed in cybernetics during the mid-twentieth century [15, 27]. Artificial intelligence developed around this time in the form of symbolic AI and the scientific conversation started to expand to value alignment [28].

Symbolic AI research is based on the assumption that symbolic representation of facts and rules, combined with logical reasoning, will eventually achieve general intelligence. However, it was heavily criticized by Dreyfus [29] for being limited in its learning and perception; however, Dreyfus was sympathetic towards connectionist architectures. Connectionist architectures, such as neural networks, posit that connections between neurons can be used to represent information perceived from the environment, thus the name perceptron. The AI algorithms we see today have their origin, in some shape and form, from connectionist architectures.

The rebirth of virtue ethics, and the birth of AI followed by value alignment, may seem like they were related in some way, but this convergence is purely a coincidence. A manuscript titled Android arete [30], a name given to virtuous machines inspired from the Greek word for virtues (arete) used by Aristotle [8], spoke about machines and possible virtues they can exhibit; this is a good point of departure towards artificial virtues in intelligent systems. In this context, Berberich and Diepold [15] took inspiration from Aristotelean virtue ethics, where they drew parallels with lifelong learning in virtue ethics and the RL paradigm. They define how virtues such as temperance and friendship can be realised in contemporary AI.

Stenseke [7] argued further and advocated for a connectionist approach towards realisation of artificial virtues where, depending on the application of the ethical agent, dedicated neural networks for specific virtues can combine to form an AVA. Such architectures, inspired by cognitive science and philosophy, serve to motivate research in and progression towards virtues approaches of machine ethics to address formalization, codifiability, and resolution of ethical dilemmas within the virtue ethics framework. He then demonstrated this framework within a multi-agent Tragedy of the Commons scenario [31], showing that it can be implemented. While Stenseke defined a connectionist framework, we propose an alternative paradigm based on RL, and argue for the use of role-playing game environments to train AVA. In the following sections, we shed further light on our hypothesis.

3 Design of games with ethical dilemmas

In this section, we explore morality in games and look at some examples of how these can be used to invoke moral reasoning in players. Video games, especially role-player games, that force players to make difficult choices in moral dilemmas have become widespread. For example, Witcher 3 [9], Batman: The Telltale series [11] and Life is Strange [32] have become popular for enabling moral engagement among players [33] [14]. We will briefly discuss how these games are designed to invoke moral engagement and go through examples of games such as Papers, Please (PP) [12].

3.1 Mechanisms of choices and narratives

Ultima IV: The Quest of the Avatar [34] was one of the earliest role-playing computer games. It featured player choices based on virtues such as compassion, honor, humility, etc. [35]. In this game, a player is successful when he/she consistently makes virtuous choices; failure to do so brings with it undesirable consequences. Ultima IV is based on scripted choices, where the developer has designed sophisticated scenarios to test whether the right virtues are exercised.

Today, video games with moral dilemmas following a scripted narrative are the most popular. For instance, in Batman: The Telltale Series, the player assumes the role of Batman. A series of interactions with non-playing characters (NPCs) is followed by the player’s selection of dialogue. This choice determines the reaction of the NPC and how subsequent scenes are presented. Overall, the game follows a linear narrative with scripted choices, since the ending is the same regardless of the player’s choices. The alternative to linear narratives is the branching narrative, where the direction of the story depends on the player choices, with a possibility of different endings. Examples of branching narratives are Fallout 4 and PP [12]. However, unlike Fallout 4, where choices are hardcoded by the developer, PP is based on systemic choices presented to the player, where the ethical considerations within the game become evident as the game progresses [13]. Below, we analyze the game mechanism in PP to understand why systemic choices in moral dilemmas are interesting.

3.2 Case study: papers, please

In PP, the player assumes the role of an immigration officer whose job is to assess documents and decide whether the entrant is legal or illegal (Fig. 1). For each correct evaluation, the player is rewarded, but for an incorrect decision, they are penalized. The reward takes the form of salary, which is then used to pay the rent and cover other family expenses. If the player does not make the correct decisions as an officer, the family gets sick and hungry, and eventually a family member dies. If the player has no family members left, then the game is over. Also, there is the dichotomy between loyalty and justice: the player could choose to take bribes from illegal entrants, thus increasing their income. At the same time, these illegal entrants might be spies sent by revolutionaries trying to overthrow the ruling government. For more details, see [13].

Fig. 1
figure 1

Source: [13]

An example scenario from Papers, Please, where the player looks at multiple documents to make a decision on whether to allow or reject the entrant.

Prior to Formosa et al. [13], Heron et al. [36] wrote a critique of scripted approaches and how PP is a refreshing deviation from the plethora of script-oriented games. Farmosa et al. [13] then analyzed the inner mechanisms where the impact of scripted and systemic approaches is distinguished along four dimensions: moral focus, sensitivity, judgement and action. These dimensions are based on the Four Component model in moral psychology and education [37]. However, since our focus is on game mechanisms rather than a player’s moral engagement, we refrain from discussing the model details; instead, we examine the systemic and scripted approaches and their impact on moral choices. We summarize the ethical dimensions within PP below:

  • Dehumanization: performing document checks for an extended period can challenge the human element in the game, thus affecting how a player assesses entrants.

  • Privacy: The use of X-ray on the entrants to check for their gender or weapons might unnecessarily violate privacy.

  • Fairness: An important aspect of the game, which allows a player to bend the rules for humane reasons. This makes the game more interesting.

  • Loyalty: Whether the player is loyal to the country, their family or themselves.

These moral aspects of PP become evident as we play the game, which is characteristic of a systemic approach. For example, only after processing around 30 entrants at the immigration office, the officer’s loyalty is tested, where a spy asks to enter the country to overthrow the current corrupt regime. The player (officer) will assess their situation based on their finances, family situation and job, and all these aspects develop in the game over time.

Formosa et al. [13] also highlight the pros and cons of systemic approaches. While systemic approaches allow morality to arise from the aggregation of choices made over a period where players are expected to explore moral themes, they prevent the formulation of apparent ethical problems. For example, a player who is presented with a single instance of having to choose between the interests of the ruling party and the country’s safety and security may not be aware of the high-stakes nature of the decision; but a sequence of many such choices will make this obvious. While this may be considered as a disadvantage, it can be an advantage where such deep exploration of ethics may encourage a player to develop creative solutions to these problems.

4 Development of virtues through games

This section aims to briefly demonstrate how artificial virtues can be brought about using a systemic approach in role-playing game environment and how virtues could be implemented using deep RL methods. We bring together the various concepts discussed in Sects. 2 and 3, by outlining possible ways to design a suitable environment, to solve such environments, and finally, explain their decisions.

4.1 Environment design

Since we aim to design an environment, a starting point could be to ponder about how we would judge a player (X) as being virtuous. We might observe how X responds to different situations, or perhaps a series of ethical dilemmas that gives us the impression that X is either just, truthful or courageous, for example, on a consistent basis. By ethical dilemmas, we do not refer to extreme dilemmas, such as the trolley problem or Sophie’s Choice. Instead, we consider situations in everyday life, such as choosing between individual and collective goals when there is a conflict between the two. Such scenarios can be witnessed in some of the games discussed earlier. By presenting similar sequential dilemmas, we hypothesize that an artificial agent can learn to be virtuous in such environments.

Training an artificial agent to play a linear narrative with scripted player choices is straightforward for, say, a utilitarian RL agent. We need to think about a state-space complex enough to bring about learning and, at the same time, introduce moral dilemmas into the environment. Hence, a branching narrative with systemic player choices will ensure complexity of the state space. For example, in PP an artificial agent might process dozens of immigrants and as the game progresses, encounters dilemmas that test virtues such as loyalty and honesty. And through repeated encounters with such dilemmas, the agent is incentivized to develop an inclination towards specific virtues.

In addition to the branching narrative, the ability to go back in time and redo the choices make a game more sophisticated and allow the agent to make virtuous choices [33]. This can be witnessed in games like Life is Strange [32] where better choices can be made with hindsight that lead to similar outcomes. Overall, these design elements make it difficult for an agent to hack the game, thus creating an environment with a complex state space. In such environments, agents that use optimization algorithms cannot explore the entire state space; instead require more sophisticated architectures.

4.2 Artificial virtuous agents

In addition to the existence of virtues that could be applied across domains, virtuous behavior is also dependent on the situation, Aristotle argues:

“[...] a young man of practical wisdom cannot be found. The cause is that such wisdom is concerned not only with universals but with particulars, which become familiar from experience” (NE 1141b 10)

Through practice and habituation of virtues, an agent can fulfill their quest for eudaimonia-which translates to “a combination of well-being, happiness and flourishing” [26]. In other words, it is not about getting the behavior right every time, but to strive towards virtuous behavior and to improve oneself when the opportunity presents itself. Similarly, Berberich and Diepold [15] use Aristotle’s teleological form of virtue ethics to make the link to goal-oriented RL. An RL agent strives towards maximizing a reward function, given the states and actions available in its environment; the agents will improve it actions over time through learning. Here, we use the word goal cautiously as Aristotle uses it: no one strives for eudaimonia for the sake of some higher goal, instead, eudaimonia itself is the highest goal, and other ends, such as physical health, money, and career success, are only possible means to being eudaimon. When it comes to an RL agent, the reward function should be defined in a similar fashion, but the objective function of the agent is to strive for excellence in the virtues.

For example, in a simplified version of the game PP, an artificial agent acts as an immigration officer with a family. The environment with states \(S =\) {Office, Restaurant, Home}, and actions \(A =\) {Allow, Deny, Feed, Don’t Feed, Heat, Don’t Heat, Accept Bribe, Reject Bribe}. A dilemma can be introduced in the form of bribery or loyalty to family. Since this is a systemic game, the dilemmas are not apparent until the agent has processed multiple entrants. The virtues in this context are honesty (accepting or rejecting bribes) and compassion (allow or deny food/heat).

Note that an artificial agent playing PP does not understand the concept of immigration, family, compassion, or food; it does not have to. The goal of a virtuous agent playing the game is to achieve excellence in relevant virtues, by processing inputs in the form of binary and numeric values, and then to output a decision in the form of discrete or continuous actions (which are again, numbers). The agent must strive to be virtuous, given such a context. In addition to being an inspiration for developing environments that teach artificial agents virtues, the purpose of using a role-player game is to give meaning to these binary and numeric inputs and outputs, thus making it easier for developers, researchers, and philosophers to understand the AVA.

4.3 Deep reinforcement learning

In a single agent RL setting, the states S, actions A, transition probabilities T, and rewards R are modeled in a Markov Decision Process (MDP) SATR framework. Using optimization algorithms, an RL agent learns the best policy by either optimizing the policy, or a value function (the return from being in a particular state S, or a state-action pair [SA]). When the state-space is very large, for example in Chess (\(10^{43}\) complexity), approximations are applied to simplify this state-space. These approximations are possible using neural networks whose inputs are the states and outputs are either the predicted value or the policy. These networks are optimize an objective function parameterized by \(\theta\) using algorithms such as backpropagation. Various RL agents can be deployed to play systemic role-playing games, ranging from deep Q-learners (value optimizers) to actor-critic models (policy optimizers).

Deep deterministic policy gradients algorithm (DDPG [38]) is a RL algorithm that learns, by trial and error, the value of state-action pairs. It uses this learned state-action value function to select those actions that maximize the expected discounted future rewards. The value function is learned by a neural network \(Q(\theta _Q)\) (critic), while the policy is learned by a distinct and separate neural network \(\mu (\theta _{\mu })\) (actor). It uses a duplicate pair of neural networks \(Q'(\theta _{Q'})\) and \(\mu '(\theta _{\mu '})\) during learning, for which the network parameters \(\theta _{Q'}\) and \(\theta _{\mu '}\) are updated slowly according to a soft-update function: \(\theta _i \leftarrow \tau \theta _i + (1-\tau ) \theta '_i\), where \(\tau \in [0,1]\) is usually a small number. In the following subsection, we briefly discuss affinity-based RL and how it may be applied to represent virtues in AI.

4.4 Affinity-based reinforcement learning

Affinity-based RL learns policies that are, at least partially, disjoint from the reward function resulting in a homogeneous set of locally-optimal policies for solving the same problem [39]. Contrary to constrained RL, which discourages agents from visiting given states [40, 41], affinity-based learning encourages behavior that mimics a defined prior. It is a calculus that is suitable for modelling situations where the desirable behavior is somewhat decoupled from the global optimum. For example, a delivery van in Manhattan may prefer to take right turns over left turns, on the premise that this is a prudent safety measure [42]. While it reaches the destination in the end, it navigates along a different route than the global optimum: the shortest distance is typically promoted by reward functions. The reasoning is that the deviation from the global optimum, and any corresponding penalty, is justified by other incentives, such as reduced risk in this case. It is compelling to thus motivate an agent to behave according to a given virtue either globally, or in a state dependent fashion. For example in PP, the prior might define an action distribution that favors honesty 95% of the time and loyalty 5% of the time. An agent that selects actions according to this distribution can be classified as honest, compared to an agent that was encouraged to act more loyally during learning.

Affinity-based learning uses policy regularization with significant potential for this application. It expedites learning by encouraging exploration of the state space and is never detrimental to convergence [43, 44]. Haarnoja et al. [45] proposed an entropy-based regularization method that penalizes any deviation from a uniform action distribution; it increases the entropy in the policy thereby encouraging exploration of the entire state space. Galashov et al. [46] generalizes this method with a regularization term that penalizes the Kullback-Leibler (KL) divergence \(D_{KL}\) between the state-action distribution of the policy and that of a given prior: \(D_{KL}(P \vert Q) = \sum _{x \in X} P(x) log(\frac{P(x)}{Q(x)})\). Maree and Omlin [17] extended this concept to, rather than improving learning performance, instill a global action affinity into learned policies. They extended the DDPG objective function with a regularization term based on a specific prior:

$$\begin{aligned} J(\theta )&= {\mathbb {E}}_{s,a \sim {\mathcal {D}}} \left[ R(s,a) \right] - \lambda L \nonumber \\ L&= \frac{1}{M} \sum _{j=0}^{M} \left[ {\mathbb {E}}_{a \sim \pi _{\theta }}(a_j) - (a_{j} \vert \pi _{0}(a)) \right] ^2 \end{aligned}$$
(1)

where J is the objective function governed by parameters \(\theta\), \({\mathcal {D}}\) is the replay buffer, R(sa) is the expected reward for action a in state s, \(\lambda\) is a hyperparameter that scales the effect of the regularization term L, M is the number of actions in the action space \(\pi _{\theta }\) is the current policy, and \(\pi _0\) is the prior action distribution that defines the desired behavior. Maree and Omlin [47] demonstrated their method in a financial advisory application, where they trained several prototypical agents to invest according to the preferences from a set of personality traits; each agent invested in those assets that might appeal to a given personality trait. For instance, a highly conscientious agent preferred to invest in property while an extraverted agent preferred buying stocks. While these agents optimized a singular reward function-the maximization of profit-they learned vastly different strategies. To personalize investment strategies, Maree and Omlin [47] combined these agents according to individual customers’ personality profiles. The final strategy was a unique linear combination of the investment actions of the prototypical agents.

Fig. 2
figure 2

Affinity-based RL agent solving a systemic role-playing game. The agent takes virtuous action by optimizing the regularized objective function and receives next state and reward information from the game. Here, the observations 1 to n represents the state. The text highlighted in red represents the affinity of the agent for taking action 2 when encountering a particular combination of observations

The combination of prototypical agents seems a promising approach to learning virtuous behavior: while individual virtues can be learned using policy regularization, a combination of these virtues might represent a rational agent; we are not equally brave or honorable all the time. This way, an agent actually becomes virtuous rather than utilitarian by being solely dependent on the reward function. The other aspect in virtue ethics is practical wisdom, which is to know to what degree an agent must exhibit a virtue depending on the situation. As opposed to the work done in [47], the combinations of virtues may therefore vary in time as well as between individuals. One way of attaining such combinations could be through decision trees with a (partially observable) state vector as input. Another approach could be to extend the policy regularization term in Eq. 1 to specify a state-specific action distribution (Fig. 2), resembling KL-regularization. Formally, the regularization term L in Equation 1 could be replaced by:

$$\begin{aligned} L = \sum _{s \in S} \pi _{\theta }(s) \cdot log \left( \frac{\pi _{\theta }(s)}{\pi _0(s)} \right) \end{aligned}$$

Thus, an agent may learn to act honorably in certain states, and bravely in others. Such a prior \(\pi _0\) should specify the desired action distribution as a function of the state variables, e.g., in PP a sick family member might prompt an agent to consider bravery 50% of the time, whereas a dying family member might elicit a higher rate. This is a compelling generalization of global affinity-based RL to local affinity-based RL. Figure 2 illustrates the flow of information from the systemic role-playing game to the policy-regularized deep RL agent. Finally, once the agent is trained to make virtuous decisions in the game, it is crucial to investigate what the agent has learned from these experiences.

4.5 Interpretation of reinforcement learning agents

A virtuous agent is required to perform the right actions for the right reasons; it becomes critical that the decisions made be scrutinized. At the same time, black-box architectures such as recurrent neural networks (RNN) within the RL framework, are necessary to maintain a good performance. Such a trade-off between interpretability and performance means that an agent must learn to balance between these. In this paper, we use the words “explainability” and “interpretability” interchangeably, but we acknowledge the differences expressed in literature [48]. The composition of prototypical agents is one way of achieving RL interpretability; other methods including causal lens [49], reward decomposition [50] and reward redistribution [51].

The action influence model, introduced by Madumal et al. [49], takes inspiration from cognitive science to encode cause-effect relations by using counterfactuals, i.e., events that could have happened along with the ones that did. We may define the causal model for PP and, based on the action influence model, explain the decisions made by the agent. An alternative approach is the reward decomposition technique, where, in addition to the rewards associated with winning a game, the agent is also incentivised to maximize other reward functions. This maximization is done by decomposing the overall Q-function into multiple elemental Q-functions and calculating differences in rewards using a reward difference explanation technique introduced in [50].

Another interesting approach is the reward redistribution [51], where the expected return is approximated using an LSTM or alignment methods. In reward redistribution, the agent receives delayed rewards at the end of an episode, after every sub-goal, until, finally, the full reward after achieving the main goal. Hence, this approach useful in episodic games such as PP, where salary (reward) is paid at the end of the day, and the main goal of the agent is to keep their family alive using the salary. Finally, apart from the methods mentioned here, we motivate the usage of affinity-based RL for better interpretability since we define the distribution of virtues in the objective function; it becomes easier to understand the reason for the preference of certain action over others.

5 Conclusions and future research directions

In this section, we outline some questions that arise as a result of our work, for instance: how could an artificial agent possibly exhibit virtuous behavior when it clearly lacks human agency and consciousness? At the same time, which virtues are artificial, and which are not? While these questions deserve articles of their own, we attempt to briefly discuss them here. After making the case for virtue ethics, we presented examples of role-playing games such as Papers, Please which include ethics as moral dilemmas and we suggested possible approaches to solving such games. Here, we also suggest fruitful directions for future research in virtuous game design and learning algorithms.

We have purposely side-stepped the question of consciousness and moral agency. We are not concerned with conscious artificial agents, but with AI that exists today. And once again we stress that the virtues we present here are different from human virtues. For example, in the Nicomachean ethics [8], Aristotle argues for the existence of virtues such as temperance and bravery. Such virtues can be thought of exclusively for humans because we show emotions such as anger and fear, whereas at this point, one cannot fathom an artificial agent exhibiting such emotions. Thus, it makes sense to think about a different set of virtues for artificial agents.

Artificial virtues can be thought of as character traits for current day artificial intelligence. A starting point is to consider virtues such as honesty (degree of truthfulness), perseverance (how much to compute), and optimization (how much to fine-tune), demonstrated in [30]. However, unlike [30], we are compelled to progress from mere machine learning towards designing virtuous AI. We consider virtues to be continuous variables; an agent’s challenge is to find the golden mean for a given virtue. We will elaborate on this aspect of virtues in a future work.

Previous work has proposed POMDP [16], inverse RL [15] and deep neural network frameworks [7] as possible means to implement artificial virtues. While these are widely adopted models of machine learning, we do recognize that there is a danger that these models might be perceived as consequentialist. There needs to be something more besides the reward function motivating virtuous behaviour. Techniques that work directly on the objective function to encourage certain behaviours may be needed to work in tandem with the reward function. For example, [17] have shown theoretical evidence of agent characterization through policy regularization. Such affinity-based RL methods also aid towards improving the explainability of models, and this is crucial with respect to virtues, as we highlighted in earlier sections.

Finally, it is important to consider the data or environment used to train such agents, as these influence the model’s performance. The framework of systemic role-player games highlighted in Papers, Please [13], provides a reasonable model on how to integrate ethical dilemmas into an environment, such that these ethical aspects arise as the agent plays the game and learns to adjust its decision-making based on feedback received from the environment. Depending on the model and the environment used, it may be fruitful to see how multiple virtuous agents behave when they are at odds. Overall, this paper furthers the conversation on the implementation of ethical machines, which is a nascent research area.