Those who have worked on Machine Learning (ML) projects know that ML requires a large amount of data to train the resulting algorithms. Some would say you can never have too much data. There is usually a correlation between the amount of data and the sophistication of the resulting ML model. This data hunger is only going to get more intense as AI progresses towards new benefit pools while leveraging more sophisticated AI capabilities. Since there are other contributing trends bedsides the sophistication of AI, the question looms for organizations is, “do they have the right data to fuel successful AI efforts?” If they don’t have enough, should they inventory more in anticipation of the AI feast?
Figure 1: The AI / Data Continuum
It’s not likely that all that big data that organizations have been hoarding is the correct data, but understanding where AI is going will give an organization a “leg up” on culling and collecting more of the correct data as AI progresses during the next decades.
The Progression of AI Changes the Data Game
While ML requires significant amounts of data to self-modify its behavior, the appetite of AI increases quickly as the sophistication of the AI capabilities increase. There is a big step from machine learning to Deep Learning (DL) in that DL requires much more data than ML. The reason being that DL is usually only able to identify concept differences with the layers of neural networks. DL determines the edges of concepts when exposed to millions of data points. DL allows machines to represent concepts via neural networks as the human brain does, thus allowing more complex problem-solving. AI can also work on fuzzier problems where the answers are more uncertain or ambiguous. These are typically judgment or recognition problems that can extend to the creation or other right-brained activities. This again requires more data, which in some cases may be emergent or real-time in nature.
The Shift from Data-Driven to Outcome Driven
As AI moves up in the sophistication of the problems its assists or solves, it will become data-driven and goal/outcome-driven. It means that the AI may request data on the fly that it needs to solve a particular problem or make a specific deduction, thus complicating data management. It may involve the interaction of inductive data-driven portions of a solution with the deductive needs for data based on a hypothesis to reach a target. This kind of dynamic interaction is needed for outcome-oriented problems. It is much different than just interrogating the data looking for interesting events and patterns. Decision driven approaches fit right in the middle of these two distinct approaches. Some decisions are operationally focused and improved through matching data with outcomes. More strategic decisions will pick up on both inductive and deductive approaches. This is just another demand channel to boost data usage.
The Shifting Problem Scopes Impact Data Needs
The scope of AI solutions are will typically start narrow and move to wider scope over time, thus requiring more data. Complex solutions typically target more than one answer and will require more data to support the tributary solution sets, contributing to a complex/hybrid result. As the scope of decisions, actions, and outcomes span more contexts inside and outside an organization, more data will need to be obtained to understand each context and their interactions. Each of these contexts could be changing and morphing at different rates, therefore, requiring more data yet.
It’s clear that more data will be the hallmark of AI-assisted solutions. The data appetite might come from more challenging problems, the better leverage of advanced AI/analytics or growing end to end value chains. One thing is for sure. Organizations had better get ready for the new world of “AI/Data Interaction”. It could change or extend data management policies, methods, techniques or technologies. Refer to Figure 1 to see the interaction possibilities.