Leverage AI to Create Autonomous Policies that Learn and Adapt without Human Intervention
Policies are the foundation for any successful organization. Policies are the rules, or laws, of an organization. Policies document the principles, best practices and compliance guidelines that aid decision-making in supporting the consistent and repeatable operations of the business. Heck, one could argue that an organization’s culture is better defined by its policies than it is by the character of its leadership team.
Unfortunately, the management, creation and execution of policies haven’t changed much since the days of “time-and-motion studies”. In many cases, policies are nothing more than a static list of what-if rules that govern what workers are to do in well-defined situations. For example, [If your car has been driven over 3,000 miles since the last oil change, thenchange the oil] or [If you haven’t visited the dentist in greater than 6 months, then visit the dentist].
But what if…what if these policies weren’t just static if-then rules but were instead AI-based models that changed to optimize the actions based upon the constantly evolving state of the environment in which the business operates…without human intervention?
In much the same way that we are seeing AI being used to create autonomous vehicles, robots and devices that learn and adapt without human intervention, can we leverage AI to create autonomous policies that learn and adapt without human intervention?
Creating Autonomous Policies
First, let’s modernize our definition of “policy”:
A “Policy” is a codified set of agent-based (human or machine) analytics that guide actions (or decisions) based upon current state (or environment) that optimize, automate and operationalize (scale) an organization’s business and operational models.
And my hypothesis is this: if a policy can be documented and automated, then it can be integrated with AI / ML to become autonomous so that the policies and procedures learn and adapt without human intervention. Policies that self-monitor, self-diagnose, self-learn and self-change/evolve?
Identify à Document à Codify à Automate + AI/MLDL yields Autonomous policies that learn and continuously evolve without human intervention
If an autonomous device can gain information about the environment through interaction, learn through those interactions and update its operating model without human intervention, then why can’t the policies that support the operations of the business do the same thing?
Achieving Autonomous hinges on the ability to apply AI, or Deep Reinforcement Learning, to the governance and evolution of these policies. Deep Reinforcement Learning is the combination of a deep Neural Network (Convolutional Neural Network) for image recognition and classification, with Reinforcement Learning for autonomous agents to learn and improve operational effectiveness to yield. Combining convolutional neural networks (CNN) with Reinforcement Learning allows the agent to recognize its current state and rank the best actions to perform given that current state.
The goal of Deep Reinforcement Learning is for an autonomous “agent” to learn a successful strategy from continuous engagement with the environment. With the optimal strategy, the agent can actively adapt to the changing environment to maximize rewards (current and future) while minimizing costs (see Figure 1).
Figure 1: An agent interacts with its environment, trying to take actions to maximize cumulative rewards
Deep Reinforcement Learning factors in Figure 1 include:
- State: Current position of all entities with the surrounding environment. For an autonomous vehicle, it would be the location, direction and speed of all the surrounding entities including other cars, cyclists and pedestrians).
- Action: Inventory of potential actions. For our autonomous vehicle, that inventory of potential actions could include turn, stop, slow down, accelerate, and reverse.
- Rewards: Maximizing Positive Rewards (safely navigate to next location, safely getting to final destination) while minimizing Negative Rewards (crash, wasting fuel, increasing carbon emissions, injuring others, traffic tickets)
- Policy: State-to-action mapping that defines what actions the agent should take in a given situation in order to maximize its positive rewards while minimizing its negative rewards
For example, today we have a society policy or rule dictating what drivers are supposed to do when they arrive at an intersection at the same time. When two vehicles arrive at a 4-way stop at the same time, and they are located head-to-head and one of the vehicles intends to turn right and the other intends to turn left, the vehicle turning right has right of way. Move forward slowly before entering the intersection to indicate to other drivers you are making the turn. The driver turning left should wait until the other car has fully passed (see Figure 2).
Figure 2: “The 4 Rules of 4-Way Stops”
However, in a word of autonomous vehicles, those if-then rules to guide safe decisions navigating an intersection just won’t work. The promise of flawless traffic and reduced traffic congestion would give way to a series of frustrated autonomous vehicles starting and stopping at the intersection.
So instead of the old policy for determining to whom to defer when multiple cars arrive at the intersection at the same time, we’d have to develop a new policy that can continuously learn and evolve as the flow and density of traffic patterns changes throughout the day and in response to special events and situations (see Figure 3).
Figure 3: Enterprise TV Commercial, ‘The Future of Transportation‘
The autonomous vehicle (agent) must constantly monitor, diagnose and learn in order to actively adapt to the changing environment to maximize future rewards while minimizing costs without human intervention.
Creating Autonomous Policies
If an autonomous device or vehicle can gain information about the environment through interaction, learn through those interactions and update its operating model without human intervention, then why can’t the policies that support the operations of the business do the same thing? For example:
· A Predictive Maintenance Policy that took into consideration for each part such factors as remaining useful life, demand forecasts, product performance anomaly detection, costs of that inventory, amount of that inventory, location of that inventory and the relative importance of that part to overall operations (for example in operating a hotel, one might accept a less accurate Predictive Maintenance policy for light bulbs which have minimal importance versus an air conditioning unit or an elevator for which downtime could have a substantial impact on profits and guest satisfaction) to make its prescriptive recommended actions.
· An Inventory Optimization Policy that took into consideration for each inventory item such factors as demand forecast, remaining useful life, supplier reliability anomaly detection, current inventory levels and inventory locations, projected obsolete inventory, etc. to make its prescriptive recommended actions.
· A Customer Retention Policy that took into consideration for each customer such factors as purchase or engagement history, purchase or engagement anomaly detection, current lifetime value for that customer, [Predicted Customer Lifetime Value] (that took into consideration such factors as current life stage because if you are now an empty-nester, you are probably suddenly less valuable to a Theme Park operator) to make its prescriptive recommended actions.
Summary: Creating an Autonomous Business
Using Deep Reinforcement Learning, we can transition from static policies to autonomous policies that learn how to map any given situation (or state) to an action to reach a desired goal or objective without human intervention. These autonomous policies would dynamically learn and update in response to constantly changing environmental factors (such as changes in weather patterns, economic conditions, price of commodities, trade and deficit balances, global GDP growth, student debt levels, fashion trends, Cubs winning the World Series, etc.).
Do autonomous policies – policies that are constantly learning and updating based upon changing environmental factors – lead to an autonomous business? Is this the modern math of the Autonomous Business?
Autonomous Policies = Identify à Document à Codify à Automate + AI/MLDL yields Policies that learn and continuously evolve without human intervention
That’s something to consider over a Guinness or three on my next trip to Ireland.