In order to get more value from their data, companies are engaged today in the process of capturing huge amounts of internal and external data besides hiring new talents to manage and exploit these new assets. Yet, few organizations are able to realize the full potential of their data and analytics capabilities. One of the main reasons is that digital transformation is at first a people transformation, and making a culture change is much harder than acquiring new assets and new capabilities.
Changing your organizational culture
An organization culture shapes how people collaborate and interact, how they make decisions, and how they run every aspect of a business. A data-driven organization is one where in each of its activities, data and analytics are considered as an opportunity for differentiation from competitors. To do so, everyone in the organization should be empowered to take advantage from data and analytics in his/her own context. This requires having access to the relevant data and democratizing the use of artificial intelligence algorithms for non-experts.
Democratizing data science in your organization
Empowered employees can then collaborate and easily experiment using the available data and algorithms to create value for the organization through several possible ways: creating new insights, fostering data-driven decision-making, testing new business models or optimizing existing processes.
Uber, for instance, has taken this path through the development of their own data science platform which provides validated forecast algorithms at the push of a button to anyone in the company (no forecasting expertise required). https://eng.uber.com/building-ubers-data-science-platforms/
The data science platforms’ landscape is changing fast, a global view of the big players is updated each year by Gartner (an IT consulting firm) which ranks the data science and machine learning platforms along two axes: the completeness of their vision and their ability to execute.
Even if many platforms are still trying to stand out by pushing more and more advanced features in their platforms which make them more suitable for expert users, you can notice a tendency to democratize the use of these platforms to non-experts data scientists also called “Citizen data scientist”. DataRobot, for instance, appears as a new entrant to the Gartner Magic Quadrant in 2019 as a visionary partly due to its capacity to simplify the use of machine learning algorithms for Citizen data-scientists by leveraging the latest advances in the domain of automated Machine learning.
“The ability to empower users to quickly and easily build highly accurate predictive models with full transparency is perhaps the most important element of any successful machine learning platform. The only ingredients needed are curiosity and data — coding and machine learning skills are completely optional!” Data Robot
Changing Your Relationship with data
The second main reason why few organizations are able to take full advantage of artificial intelligence is related to how data is managed across the organization. Indeed, while modern data science platforms make it easy to test and compare hundreds of machine learning algorithms at the click of a button, it is still a challenge for many organizations to collect and prepare the right training dataset to feed these algorithms.
The data preparation part can be time-consuming, but is definitely not a roadblock. Data scientists report that data preparation can take up to 80% of the time spent on a project. Nevertheless, data science platforms are providing more and more user-friendly data-wrangling capabilities.
On the other hand, collecting the relevant data in a coherent way across the organization can be a real issue. In many cases Artificial Intelligence initiatives are stopped because important data is not available or not usable.
To illustrate this last point, let’s look at the way organizations create value for their customers. This is often achieved through several interconnected processes (marketing, finance, research & development, manufacturing, purchasing, supply chain…). When you look at these processes from the point of view of information systems, we usually find a patchwork of different softwares, databases and Excel files which are not all the time coherent at least in the way data are structured and stored.
The classical way organizations try to tackle this problem is by launching huge digitalization programs deploying new information systems and restructuring existing ones. Such programs are often risky, take several years before achieving their goals and are very expensive, so the chance of getting any competitive advantage from data and analytics is delayed even further.
Bezos API mandate
Amazon’s approach in this domain worth mentioning: in 2002, Jeff Bezos, the founder and CEO of Amazon, issued the following memo (afterwards called the “Bezos API mandate”) to his employees :
- All teams will henceforth expose their data and functionality through service interfaces.
- Teams must communicate with each other through these interfaces.
- There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
- It doesn’t matter what technology they use.
The memo ended by saying “Anyone who doesn’t do this will be fired. Thank you; have a nice day!”.
If we put aside the authoritarian way the note was formulated and look at the consequences of such orientation, this means that if someone in the company needs to get information from marketing or any other team in the organization, instead of reaching to his/her colleagues through email/phone and get an extract from a database or a simple Excel file. He/she now can use a codified service interface to get the data. And whatever technology the marketing teams are using to manage and process this data (ranging from advanced commercial software to a simple spreadsheet) they need to expose their data through an application programming interface anyone in the company can use.
This may sound strange in term of human relationships, but it has the big advantage of standardizing the way data are structured and shared no matter how mature the information system in that organization is.
For the record, AWS (Amazon Web Services), which is a subsidiary of Amazon and provides on-demand cloud computing, has generated up to $8.4 billion in sales and accounts for 13% of Amazon’s total revenue as reported by the company in Q2 2019.
Next generation platforms
In my research, I did not find any data-science platform going beyond the data-preparation/data-modeling phases. Most of them completely overlooked the data-management part and suppose the user is able to bring his own coherent data-set or to connect to existing databases while managing all the collaboration and security aspects.
In this section, let’s imagine a kind of next generation platform where users have a seamless experience from managing their data sources to leveraging artificial intelligence algorithms to create value for the organization.
To do so, we’ll tell the fictional story of Barbara who is working in the marketing department as a product owner.
Barbara has just finished a customer’s satisfaction survey where she collected thousands of customers’ feedbacks about a wide range of products her company is developing. She then had the idea to merge this data with the advertisement campaigns budgets for each product and a set of product’s characteristics provided by the research and development department in order to find out what combination of parameters is correlated to the customers’ satisfaction.
Browsing the data catalogue
Excited to test her idea, Barbara connects to the data-science platform her company has just deployed. She now can browse the catalogue of all the data sources available in the organization. This time, Barbara prefers to ask the integrated chat bot about what type of information she is looking for, the bot then directs her quickly to a short list of interesting data sources. She goes through the list reading the description of each data source and she quickly identifies the ones she needs to use in her analysis.
Creating a new data source
It is time now for Barbara to upload her own data-set onto the platform. Once she has uploaded the file and created a short description of the content for her colleagues, the system automatically recognizes that the column used to identify customers in the survey is a reference column used in other data sources too. It suggests to Barbara to connect her data-set to the customers’ referential. The system also suggests to Barbara to remove the other customer specific columns which are already in the referential as they could be considered as duplicates.
Barbara is delighted with all these suggestions; she is reassured that her data-set will always be up to date and directly connected to the other data sources in the organization.
Do not lose sight of the big picture
The following day, Barbara receives a call from Jim, the department’s data architect. Jim wants to make sure Barbara is finding her way through the platform. Jim‘s role is to make sure the organization’s global data model is well defined, coherent and well understood by the different users. When talking about the new platform, Jim often says: “Since this platform has been deployed, I feel like a gardener in a growing garden. Our global data model is getting richer every day and we considerably improved the quality of data and our capacity to learn from all the ongoing experimentations.”
Security and collaboration
Barbara has set her new data source as public so that everyone in the organization can use it. However, she could have given access only to specific users or groups if there was any sensitive data.
The platform also provides multiple collaboration features like the possibility to start a discussion or attach comments to specific elements in the dataset. Every data source on the platform, whether loaded manually or representing an existing information system, can be requested using standardized programming interfaces (APIs).
At the end of the day
- Barbara was able to test her idea and identify some key drivers of users’ satisfaction which resulted in very rich discussions with product development teams.
- The organization’s data model is getting richer every day while keeping a coherent big picture thanks to Jim’s engagement.
- Employees are progressively abandoning Excel files for an online collaboration experience which amplifies the positive network effects of the platform.
In summary, an organization willing to take a sustainable competitive advantage of artificial intelligence needs to:
- Succeed in creating a data-driven culture which will foster experimentation and data driven decision-making.
- Leverage the power of data-science platforms to democratize the use of machine learning algorithms and data-preparation tools.
- Empower its employees to take action by facilitating access and collaboration around data while ensuring the security of data and the coherence of the organization’s global data model.