Machine learning (ML) and Data Science (DS) projects are hard to manage. Because projects are research-like in nature, it’s difficult to predict how long it will take for them to finish. They often start off with one idea and then pivot into a new direction when the proposed technique doesn’t work or the assumptions made about the data are proven wrong.
Model-building is also an inherently long process (compared to most Software Engineering and Analytics work) and it’s not uncommon for a data scientist to enter a rabbit hole and spend months on a project without a clear notion of progress. Another distinction from standard software engineering practices is that model development is usually done by a single person. This serial nature does not lend itself well to traditional collaborative SE workflows such as Kanban and Scrum.
I’ve spent quite some time researching existing workflows (primarily in Jira) to manage DS and ML projects, but without luck. Much of the information out there targets software engineering and focuses on Agile methodologies. I also talked to coworkers and friends in the field but I wasn’t able to find anything that was tailored for machine learning and data science. I noticed that some people try to adapt their workflow to standard engineering practices or, in other cases, don’t try to manage those projects at all. The latter is particularly problematic as it can lead to projects that take too long, have too ambitious a scope, and are more likely to fail.
Since I couldn’t find a good solution, I decided to build my own custom workflow for managing Machine Learning and Data Science projects. This system can be implemented in Jira and allows me to easily monitor and report on the status of projects. It also helps me limit their scope, avoiding overly complex models that take too long to build. Our scientists are provided with a structure that helps them think about how models should be built, which increases their changes of succeeding at a project. I’ve been using this system for a few years now and my group and I are pretty happy with it.
Machine learning projects have well-defined phases
Whether you’re building a complex computer vision algorithm using deep learning, a learning-to-rank model with LightGBM, or even a simple linear regression, the process of building an ML model has well-defined phases. Below is how we break the model-building process into sequential phases, from the initial research all the way to the analysis of the A/B testing results. Notice that each phase has a deliverable (or milestone) which also works as a touch point to sync up with the team or the stakeholders.
This is the initial phase of a project. It comprises talking to stakeholders to understand the project goals and expectations, talking to business analysts to find out what data is available and where to get it, creating some initial queries and investigating the data in order to develop a better intuitive understanding of the problem.
It is also in this phase that the scientist will read the literature and decide on a methodology to address the problem. This includes reading scientific papers and brainstorming ideas with their peers. Sometimes deciding on the methodology will also require learning about existing packages and building some simple prototypes using a Jupyter notebook.
Deliverable: The output of this phase is a detailed plan for the execution of the project with a breakdown of the subsequent phases (i.e, data exploration, modeling, productization, and result analysis) and an associated estimated level of effort (in number of weeks). The methodology and the data to be used must also be specified.
This plan will be shared with the stakeholders for feedback.
2) Data Exploration
This is the traditional phase of exploring the data using Pandas and a Jupiter notebook (or sometimes Tableau) in order to gain insights into the data. Typical analyses include counting the number of rows in the data, creating histograms for different feature aggregations, graphs for trends over time, and multiple distribution plots. Scientists will also build queries that will be the core of their model ETL.
Deliverable: A detailed data exploration report as a Jupiter notebook with graphs and comments providing insights into the data. This report will be shared with rest of the group and the project stakeholders.
This is the meat of the project. Here scientists will start building their models using our internal framework. This includes building an ETL, performing feature engineering, and training models. It also includes building baseline models and providing an extensive evaluation of the final solution.
Deliverables: The outputs of this phase are:
- A model prototype
- A report in Jupyter notebook with an extensive evaluation of the model
The final report will be shared with the group and the project stakeholders.
This phase is about implementing the final version code. Some common tasks include adding comments to all functions and making sure the code is formatted properly according to Python standards and the standards of the group. The code is instrumented with reporting metrics such as the number of rows pulled, the number of rows in the output, the prediction error according to several metrics, and the feature importances when applicable. Finally the code is reviewed by one data scientist and one engineer.
Sometimes the productization process will lead to a back and forth interaction with the platform engineers. This is particularly expected for real time models where runtime performance is critical. It’s also possible that the memory requirements for the code are too aggressive, leading to problems down the production pipeline. The engineers might push back and require a reduction in the memory footprint for training the model.
Deliverable: The output of this phase is a committed code to the master branch that is ready for deployment by the platform engineering team.
5) A/B Testing
Most models will undergo an A/B testing phase. Here, scientists and stakeholders decide on the details of the test: how long it will run for, with what percentage of traffic, what is the control, how to interpret the results, etc. While the test is running, team members will mostly focus on other projects but they will need to monitor the test.
6) Results Analysis
Every scientist is responsible for a detailed analysis of their own model results. Here they’ll analyze the results metrics in many different ways to understand what’s really going on. In particular, when the test is unsuccessful we’ll need to do a deep dive into the results to figure out what went wrong.
- A detailed report of the results in a Jupyter notebook.
- A hypothesis as to why things didn’t go as expected (when applicable)
The final report will be shared with the group and the stakeholders of the project.
Working with Jira
While this framework might look great in theory, the reality is that the above phases are rarely purely sequential. For example, it’s very common to jump back and forth from data exploration to modeling and then back to data exploration. Also, this process doesn’t fit into an existing Jira framework, so how can you implement this in practice?
It’s actually pretty easy. We use a Jira Kanban board and swimlanes (one per team member) with a few custom fields and changes. The guidelines below define the essence of our process:
- A new Epic ticket is created for each project and the work is split into Tasks.
- Every Task is tagged with a Phase, a custom field in Jira for selecting one out of the 6 phases listed above. (Notice that a phase can have multiple Tasks.)
- Tasks cannot be longer than 1 week (5 days). This forces team members to break down their work into smaller (but still pretty sizable) chunks, allowing for progress monitoring with minor overhead.
- There can only be one ticket in progress at any time. This ensures that we always know in what state the project is.
- Phases are not always sequential and it’s OK to move back and forth between Phases as new Tasks are created.
Managing ML and DS projects doesn’t have to be complicated. At first, I spent about 30 min a day monitoring this process, but once the team got used to it, my time reduced to 15 minutes a week! I know in what state each project is at any point in time, how long a project has taken, and I can quickly identify issues so I can jump in and help my team when needed. My data scientists have a clear framework for thinking about how models are built and they have become much more efficient at it.
I hope you can find this useful as much as it’s been for me.