In my prior Forbes blogs, I outlined key reasons for board directors and CEOs to advance their AI governance practices and reinforced the imperative to recognize AI is a top ten security risks, as reported by EY in their Global Risk Survey.
In my blog last week, Cathy Cobey, EY Global Trusted AI Lead, and I discussed the maturity of AI third party audits or certifications of AI models or systems, and concluded the industry was still in a very immature state, as organizations like IEEE and ISO are working on standards for AI, but won’t be available until 2021.
This blog discusses the acceleration of data bias in AI models, introduces five types of AI bias, and identifies key governance questions for board directors and CEO’s to ask their organizations in order to mitigate data bias risks
Why care about data bias?
Well first, we have already experienced a tenfold increase of data between 2013 and 2020. According to IDC in their Data Age 2025 report, data creation will grow to 163 Zettabytes by 2025, which is about ten times the amount of data produced in 2017. In other words, there is no end in sight of data production and its growth acceleration will require stronger board governance and CEO leadership to ask the right questions and keep their organizations modernizing in its AI practices.
Did you know that there are over 180 human biases that have been defined by IBM and classified that can impact how we make decisions? In other words, as data volume increases, the risks of making wrong decisions from data is what we call bias. Bias implies interference in the predetermined approaches, ideas, prejudice, or influence on interpreting data.
So how do biases find their way into AI systems? We cannot ignore that bad data used to train AI can contain many different forms of data bias, gender, racial, ideological, and erode trust between man and machine. To manage these risks, board directors and CEO must have qualified and skilled experts in their organizations to manage data bias. There are five forms of data bias to have as a basic foundation of knowledge to guide leaders foreward.
The first is selection bias and this is when a data set has been selected is not a representative of the population that is being analyzed. For example, you are analyzing your current customer data base to predict who has the propensity to purchase more but your current customer base does not have the right profile to achieve your strategic growth goals, so you may require to augment your internal data set with an external data set to build a more representative population to build your AI models from. At the same time, you will need to think of how to continue to fuel the data set if you for example what to advance to a global population data set. Data scientists must ensure that the results are applicable to an entire population and avoid false extrapolations. In other words, when you are building AI models data is like fuel and without the right fuel, you will not reach your desired destination.
The second is confirmation bias which is when the data scientist wants to prove out a predetermined hypothesis and intentionally excluded variables from the analysis to get to a predetermined goal. This is a very frequent occurrence and another reason why careful statistical reviews of data sets should be undertaken by experts.
The third type is overfitting or under fitting bias. Overfitting is when the model is overly complicated, and when in actuality it is not the case, of under fitting is when a model gives a view of reality that is over simplified.
The fourth type is outliers bias. Outliers significantly affect the process of estimating statistics (e.g., the average and standard deviation of a sample), resulting in overestimated or underestimated values. This is when you find that the distribution of values are much higher or lower, than the region of all the other values. Outliers make observations dangerous as they should not be used to make decisions upon, clustering of data patterns that are closer together create stronger foundations for making more accurate conclusions.
The fifth area is confounding variable bias and this is when a variable is outside the scope of an existing analytical model. Positive confounding is when the observed association is biased away from the null. In other words, it overestimates the effect. Negative confounding is when the observed association is biased toward the null. In other words, it underestimates the effect.
There are many cases of data bias, some like Amazon’s recruiting model not liking women which got scrapped, labour market discriminations, gender misclassifications , and many of the world’s image data bases are fraught with bias. There are many other forms of data bias in data sets, but these noted above are common areas to watch out for.
Seven questions that as a board director or a CEO you can ask tomorrow morning to mitigate your data bias risks and get you leading foreward with strong AI practices are:
1. How do you detect data bias in your AI practices and methods?
2. What are the tools you have to detect data bias in your AI models? How robust are the methods?
3. Do you have an expert trained in statistics and data bias in your AI engineering teams that validates the data set is representative to solve the problem that is being explored with AI methods?
4. Given data in AI models can be a high cybersecurity risk, how much training have your executives, managers had to understand the risks of data bias and ensure robust operational controls are in place?
5. If you find data bias in your data set, how are the risks logged and is there a clearly defined risk mitigation plan to ensure that no data modelling using AI is initiated until a data bias risk review has been signed off to proceed?
6. What metrics do you have to measure the types of data bias in your AI models and are the risks classified so you have a continuous learning environment?
7. Have you had a third party validation by an external data bias expert appropriate to your data model risk classification levels?
Some of the challenges with AI is that given its ability to handle volume it can easily scale up data bias to higher risk levels, if not caught early in the discovery and validation process. To wrap this blog on AI data bias up, a good video to listen to is IBM researcher Francesca discuss AI bias where she further discusses the importance in building systems that make decisions or guide humans to making better decisions, and ensure AI data bias is not prevalent. To read more about this topic, refer to the McKinsey article on AI bias.
My next blog will identify diverse sources to keep learning more about AI and data bias to ensure as a board director or CEO, you can sleep at night and know you have your organizations acutely skilled in identifying data bias and managing risk to ensure your organization is not the next legal case for not getting this important issue right.
For more questions addressing board director and CEO Governance on AI, refer to Dr. Cindy Gordon’s Forbes blog roster.
Research: If you know of a company with an AI expert on a publicly traded board, please send an email to email@example.com.
Dr. Cindy Gordon is a CEO, a thought leader, author, keynote speaker, board director, and advisor to companies and governments striving to modernize their business operations, with advanced AI methods. She is the CEO and Founder of SalesChoice, an AI SaaS B2B company focused on Improving Sales Revenue Inefficiencies and Ending Revenue Uncertainty. A former Accenture, Xerox and Citicorp executive, she bridges governance, strategy and operations in her AI contributions. She is a board advisor of the Forbes School of Business and Technology, and the AI Forum. She is passionate about modernizing innovation with disruptive technologies (SaaS/Cloud, Smart Apps, AI, IoT, Robots), with 13 books in the market, with the 14th on The AI Split: A Perfect World or a Perfect Storm to be released shortly. Follow her on Linked In or on Twitter or her Website. You can also access her at The AI Directory.