workflow stages / lifecycle

MLOps Lifecycle: What are the stages of MLOps development

Saif Abid
January 16, 2024

To successfully develop and deploy a machine learning (ML) application that is resilient and robust, it’s important to understand how to design a bulletproof ML application development workflow.

In this post, I am going to take you through the machine learning lifecycle we use here at Bitstrapped with our client projects. This process ensures that the resulting production ML application delivers on its objectives and is resilient and adaptive over time.

The workflow we use has two distinct phases: An Experimental Phase and a Production Phase. Each phase of the workflow has a set of unique workflow stages. Below I have defined each stage, its role, and its objectives to help you better understand the entire methodology. 

Experimental Phase

The Experiment Phase of the workflow is broken down into three key stages:

  • Problem identification and data collection and analysis
  • ML model selection and development
  • Data experimentation and model training, which includes tuning  of model hyperparameters

Production Phase

The Production Phase of the workflow has four key stages:

  • Transform data
  • Train the machine learning model
  • Serve the model for online/batch prediction
  • Model the monitor’s performance
MLOps lifecycle - Stages of development
Image courtesy of

Stages of the Experimental Phase

Let’s now step through each stage of the workflow to help you understand why each element is important and what each one is designed to do. 

To help illustrate the workflow, we’ll use a sample use case where we’re deploying a machine learning application to detect patient falls in video footage acquired from video equipment in a hospital environment.

Step 1A: Problem identification and data collection and analysis

In any ML application workflow process, the first step is to define the problem we are trying to solve for our client and collect data that we can use with our machine learning models. 

In our fall-detection application example, we would first want to collect video data in a hospital. The application we develop will use image processing to detect patient falls from the live video feeds. In a production environment, once a fall is detected the application would report the incident to the nursing or support staff that is caring for the patients and trigger an alert so staff can respond and help the fallen patient. 

Data Collection and Ingestion

The process starts with data collection from cameras that are set up in the institution. If they don’t already exist they would be installed to specifications. If they did exist, our engineers would extract sample video data from them to experiment with.

The sample video would then be ingested into a data warehouse in our development environment.  We only work with a sample because experimenting on a live video would be onerous. Using a subset of the dataset makes testing easier, and once we transition to the Production Phase do we then introduce the full dataset to the application.

Data Labeling

The next step in the MLOps life cycle is data labeling.

In our example, our team of data scientists would use a data labeling software tool to mark up the video and isolate 3 to 5-second video segments that have captured examples of patient falls. Perhaps we might start with 1000 hours of footage used in experimentation.

That can be a very labor-intensive process so our data scientists use a variety of tools and services to speed up the labeling process or in some cases automate labeling in the test data.

You need to be able to know which video feeds had people falling in them and which ones didn't. We might choose to label video of people sitting, standing or walking to help identify behaviors captured on video.

This helps a machine learning application define what is happening in a video and what a patient fall looks like relative to the other activities shown in the video. This makes it easier to accurately detect patient falls from live footage in a production environment.

We would also use labeling to identify specific elements of the video footage. If the video sources show patient rooms then we could also use labeling to identify unique physical spaces in the hospital. This information is going to be used in the ML model to assist in understanding the context of the video and how it relates to the behavior you want to identify; In the case of our example: a patient fall.  

1B: Machine Learning Model Selection

Once the sample data has been labeled, our team then selects a machine learning model and starts to test different algorithms to provide the application with the best method of identifying a fall in the sample video. This is where the data scientists apply their math and magic.

We might use any number of video image processing algorithms and for each algorithm, we would test different parameters on them.

We might test color video versus black and white footage. Or we might adjust picture quality parameters such as contrast, brightness, or sharpness. We may even test various video resolutions. All these adjustments and tests help us optimize the ML application’s accuracy for fall detection. 

1C: Experiment with Data and Model Training and Tuning with Hyperparameters

As the data scientists test various algorithms and parameters with those algorithms, we also want to track each attempt and each test iteration.

That is because, in future, we might want to be able to go back to a specific model that we have tested and see what data we tested it on and what parameters we used. This is all documented using a tool like Vertex AI.

This is important in MLOps because you can only improve on something if you know exactly what you had to begin with. If that model starts to fail in the future or the data changes and you want to improve it, you should at least know how it was created and under what conditions. 

At this point in the MLOps lifecycle, our data scientists will also transition off their local computers and start their work on the Google Cloud Platform (GCP) to leverage speed and performance. This elastic computing capability allows for quickly expanded computer processing, memory, and storage resources on demand without worrying about capacity planning and engineering for peak usage. The GCP offers performance that a local computer can’t provide.

Hyperparameter Tuning

Once our data scientists have selected a good algorithm and worked to optimize it using labeled data and further tweaks, they will arrive at fairly robust functionality.

However, we’re not yet done here. They will then move to the next stage where they iterate and tune the model even more. In this optimization step, we are looking to see if there are better parameters that can be selected to improve performance. We call this hyper-parameter tuning.

Hyperparameters that are tested might include adjusting the number of layers that are used in a neural network. Hyperparameter selection might also include weights and biases which can be adjusted for optimum performance. There may be any number of additional hyperparameters we might test in our efforts to optimize the ML model’s performance.

Finishing the Experimental Phase

By the end of the Experimental Phase stages in the MLOps lifecycle, the end result is an algorithm that is set up and functioning well on sample data and is demonstrably functional. We would also have a record of all the experimentation conducted to date and the various outcomes that the work has produced.

Stages of the Production Phase

At this point in the MLOps life cycle, we have completed the Experimental Phase and iterated through it until our team determines it is time to move into the Production Phase.

Around our shop, we call it “productionalizing” the model.  (Some people call it “productionizing” the model.) You won’t find this industry buzzword in most dictionaries, but the definition of this term is that we put the machine learning model into production.

2A: Transform the Data

In the next phase, the objective is to put the fully tested ML application (a packaged binary) onto the camera system in the hospital. We are not done at this point, however. 

The first step in this Production Phase is to conduct application training with the full set of data. While we may have been experimenting earlier with 100 GB of data while we were experimenting, we now need to train the application using the full set of data. So we’d be scaling up to use 100TB of video from the 100 GB of sample data we experimented with. 

2B: Train the Model

Once the model is trained on the full dataset we need a model registry to track versions of the model as it is deployed into production and tested. 

Version Control

For that, we can use DVC (3) a tool by or we can use Vertex AI models. The model registry is a repository that stores all of the different model versions through the application’s production life cycle.

2C: Serve the Model

Once the model is in full production and its versions are tracked in a repository it can be further tweaked and optimize

A/B Testing or Canary Testing

We also want to do A/B testing of the model in the production environment and compare it to an earlier version of the model. We might also do what we call “canary testing” where we test a new production binary on a small subset of the data in the production system and evaluate how well it is doing. Then we incrementally increase the volume of data it processes as it proves itself.

From the example, we might start with 1 percent of the video data and increase it to 50 percent, comparing it to a previous version of the application. When the development team is happy with the performance of a model, it is then given 100 percent of the data available. AT this point the older version of the model is removed from production.

2D: Monitor the Model’s Performance

Drift Monitoring

Once we deploy a model into production, it is important to monitor the system for CPU and memory usage, and other project-specific infrastructure performance like data bandwidth, camera power states, etc.

However, we also conduct what is called “drift monitoring”. This is also sometimes called model drift or data drift.  There are two forms of model drift. The first one is called the train inference, drift, or skew. You can call it either. 

What you’re looking for here is a major change in the production environment that which the model is operating. 

To use the fall detection application example to understand this,  let’s say the application has been deployed for five years and functions extremely well. But then the facility decides that it will upgrade all the cameras to the next generation of equipment. Now the system is receiving ultra-high-definition video feeds. High definition video that is captured by the system is replaced with ultra-high definition and the resolution goes from 1280x720 pixels (720p HD) to 3840x2160 pixels (4K).

The original machine learning code has not changed, but the data quality has changed. With machine learning, if the data changes, the behavior of the program may need to be able to automatically adapt to it. 

Long term inference or data drift

Data trends can also shift over time. We wanna be able to detect those and automatically update and retrain our models to accommodate them.

Here is how data drift might show up in our example ML application. Let’s say your model was trained typically on video of adults in a facility, but now the hospital reconfigures its services to care for more children as patients.  It could be that the model doesn’t detect children falling as effectively as it does for adults.

Or the hospital does a renovation of a wing and the position of the beds change when the patient rooms get a design refresh. That too could require a model update.

The server that runs the model can monitor for drift. When it is detected, it can either send out an alert that can then be used to kick off the human intervention or automated pipelines that are engineered to self-correct the model.

A machine learning development team will work to anticipate this and automate a response as best as possible. That is the key to machine learning, adapting the model as inputs change.


This Machine Learning Workflow is a highly tuned process that is designed to ensure that the MLOps lifecycle is correctly designed, tested and deployed so that the application achieves the results our clients are seeking. Even though we used an image processing application as an example, this same machine learning lifecycle can be used in any ML project on any application. The use case may change but the way we approach it at Bitstrapped doesn’t. We use a tried and true formula in all our ML development projects. If you have questions or would like a free discovery call with our team contact us.

Article By

Saif Abid

I have a very deep interest and experience working with Golang, Distributed Systems, Data Engineering and ML. When I’m not working with teams to figure out what to build next or how to build it, I'm heads down getting services built and shipped to customers using best in class tech for the problems at hand.