Containers as an enabler of AI

Containers and microservices are accelerating AI development, allowing organizations to build applications once and run them anywhere.

When companies first started to use Docker containers back in the early 2010s, their main focus was on solving the mobility problem: getting software to execute and produce the same results no matter which host system it was running on. Creating individual microservices that can be modified and scaled easily without impacting the whole system brought predictability and reliability to development environments.

Building complex artificial intelligence (AI) applications wasn’t high on container users’ priority lists. AI apps took a lot of work to build and a lot of resources to deploy. Now, as containers grow in popularity and AI adoption enters the mainstream, enterprises are starting to leverage containerization to gain flexibility, portability, and reliability for the AI and machine learning lifecycle.

The AI market is exploding. In North America alone, the AI market for hardware, software, and services is expected to grow from $21 billion in 2018 to $203 billion in 2026. It’s playing an important role in uses ranging from self-driving cars to digital voice assistants to sentiment analysis.

AI expansion is being driven by a number of factors. These include the widespread availability of large-scale datasets from many sources, increased organizational awareness of the potential value of data, more readily accessible AI tools and technology, cheaper compute, and a growing number of data scientists and engineers. In short, people are seeing that results from AI can actually pay off.

The value containers bring

Why are companies using containers to facilitate the development and deployment of AI apps? The primary reason is that containers provide flexibility by allowing applications to be built once and run anywhere―on any server, with any cloud provider, on any operating system.

To leverage their machine learning to deliver improved business outcomes, enterprises are hiring a lot of data scientists. These highly skilled people build software mathematical models that process vast amounts of historic data to make predictions about business outcomes. Google, Tesla, Amazon, and others have disrupted markets largely based on the insights they generate through advanced machine learning models.

But hiring data scientists isn’t enough. As AI moves from an artisanal pursuit to a more widespread, enterprise focus, companies need to ensure that there are tools and processes in place to take the machine learning models and deploy them into production applications. AI’s potential can be realized only through the use of production-grade tools and technologies.

The challenges of building machine learning models

Creating machine learning models is an iterative process data scientists typically go through: data exploration, data pre-processing, feature extraction, model training, model validation, and model deployment. It’s not a matter of building once and saying you’re done. If you knew in advance exactly what was going to be needed, you could just write code. Instead, you use data and machine learning to teach and re-teach the software until it converges on a solution that satisfactorily represents our real world.

A machine learning project goes through multiple steps and iterations, first in experimentation to figure out which algorithms are best suited for the data and business problem at hand. Data science teams often experiment with different training algorithms in different environments simultaneously and pick the one best suited to the problem at hand.

The challenge is planning for and managing the highly variable needs for compute power. Training ML models is compute intensive, particularly during the data extraction and model training phases. Model inferencing—the process of using a trained model and new data to make a prediction—requires relatively less compute power, but these compute systems need to be reliable, as they are serving up models for critical business functions. To accommodate these variable needs, enterprises are leveraging a hybrid architecture—on-premises and public cloud—to meet the compute needs for data science in an efficient and cost-effective manner.

How containers benefit the ML lifecycle

The use of containers can greatly accelerate the development of machine learning models. Containerized development environments can be provisioned in minutes, while traditional VM or bare-metal environments can take weeks or months. Data processing and feature extraction are a key part of the ML lifecycle. The use of containerized development environments makes it easy to spin up clusters when needed and spin them back down when done. During the training phase, containers provide the flexibility to create distributed training environments across multiple host servers, allowing for better utilization of infrastructure resources. And once they’re trained, models can be hosted as container endpoints and deployed either on premises, in the public cloud, or at the edge of the network.

These endpoints can be scaled up or down to meet demand, thus providing the reliability and performance required for these deployments. For example, if you’re serving a retail website with a recommendation engine, you can add more containers to spin up additional instances of the model as more users start accessing the website. Then, when demand drops off, you can collapse the containers as they’re no longer needed, improving utilization of expensive hardware resources.

AI and isolation

Packaging an application and its dependencies in isolation from other containerized applications is particularly useful for AI systems. AI amplifies the need for isolation because it requires more different versions of software tools and models than conventional software development. Traditional developers are used to reacting to software updates, and the changes that result are usually much more subtle than with different versions in building AI models.

For example, data scientists may rightfully be very sensitive to different versions of TensorFlow or PyTorch for use with GPUs. The choice of tools and tool versions can dramatically affect how long it takes for a model to train or which solutions will converge. So data scientists want to be able to control the environment in which a model runs and be able to have multiple environments for different models at the same time. Each model can run without interfering with others.

Reproducibility is also important when you’re looking to retrain a model that had been trained before. To ensure accuracy, developers need to bring up the exact environment with all the same versions of tools and dependent libraries. A version change in a dependent library package could throw the results out of whack.

Unlocking AI/ML use cases

Machine learning applications are by nature heavily dependent on data. So deploying them on containers is not as straightforward as deploying web applications or other microservices-based applications on containers. They require special configuration options for persistence of the data in the container. Although containers are great at making applications flexible and portable, it is challenging to manage multiple containers in a complex system. That’s where Kubernetes comes in.

Kubernetes is an open source framework to orchestrate deployment and management of containerized cloud-native style applications. But open source Kubernetes by itself is not sufficient for enterprise scale deployments of containerized applications. It requires a lot of additional capabilities ranging from management, monitoring, and persistent storage to security and access controls built around it for enterprise-scale deployments.

One key capability that is needed to expand the use of containers is the support for persistent storage for stateful applications. For data analytics and ML applications that need access to data, a data layer or data fabric is needed that can ensure these containerized applications all have the same consistent view of the data no matter where they are deployed.

More innovation to come

Similar to software engineering, machine learning can also benefit from the agility, portability, and flexibility that containers bring. We are witnessing a lot of innovation in this area. For instance, KubeDirector is an open source project designed to run complex, distributed stateful applications on Kubernetes. KubeFlow is an open source project designed to simplify production ML deployments. As enterprise adoption of containers increases, we are going to see a lot more innovation in this domain.

Containers and AI: Lessons for leaders

  • Organizations are starting to leverage containers in the AI/ML lifecycle for “build once and run anywhere” portability and flexibility.
  • The ability to package an application and its dependencies in isolation from other containerized applications is especially useful for AI deployments.
  • Managing multiple containers in a complex system is challenging, which is where open source Kubernetes comes in. Among other requirements, persistent storage and built-in security are key.

This article was previously published on enterprise.nxt



Original post:

2 comentários em “Containers as an enabler of AI

Leave a Reply

Your email address will not be published. Required fields are marked *