ml architecture
MLOps

ML Architecture Strategies for Three Stages of MLOps

By
Saif Abid
Updated
January 16, 2024

If you are building out a MLOps practice in your organization, you must start by making sure your data scientists can work efficiently. Last thing you want are Data Scientists who spend most of their time on things like data cleaning or manual model deployments - which should be almost entirely automated.

To enable your Data Science practice you must adopt the appropriate machine learning infrastructure for the business case in your organization. This allows for better experimentation tooling and will make your data scientists more productive. The process includes building a strategy to adopt the right tools for data analysis, experimentation, feature stores, training, ML pipeline, model registries, monitoring, and many other key ML activities. With the right infrastructure, your organization will see faster model iteration and ultimately more results in day to day business operations. 

Here at Bitstrapped, we have navigated the pains and blindspots of developing MLOps practices through our work with clients. Given this is such a recurring challenge for organizations, we have decided to break down the production ML lifecycle into three distinct architectures. Each architecture fits a stage of the ML lifecycle and our aim in writing this article, is to help your organization understand how to adopt MLOps at your given stage.


Architecture patterns for each stage of MLOps

Here are the architecture strategies for each stage of MLOps:

  1. Minimum Viable ML Infrastructure. You can consider this the Proof Of Concept (POC) architecture, capable of supporting 1 model.
  2. MLOps for organizations with more than one model running in production and supporting product or service offerings. Aka Production MLOps.
  3. MLOps for teams that are leveraging machine learning across multiple products, teams and sub divisions of the organization but need a consistent way to manage all pipelines and model lifecycles. Think of Apple who is infusing ML across their product line. We can consider this an Enterprise MLOps environment.

The ability to excel at each stage and ultimately progress, is based on an organization's infrastructure readiness. As an organization matures in their machine learning journey and increases the number of models available to them, they want to evolve from one architecture to the next. This is synonymous with how a start up goes from “0-1” and in this case it continues with "1-2" and "2-3".

Below are the three architecture patterns we have observed as being "common" solutions for the requirements present at each of the 3 maturity levels.


Architecture 1: Minimum Viable ML Infrastructure

MLOps for organizations experimenting with their first model

Use Case:

The following architecture suits organizations who are focused primarily on validating a machine learning use case or Proof of Concept. From a technical perspective, at this stage, you are training models manually through Jupyter notebooks (or similar data science tools) on the local laptops of data scientists, exporting them and “throwing them over the wall” to engineering to move into production. This architecture primarily supports the single-product, single-model organization 

Challenges this ML architecture addresses:

  1. Elastic scalability for model development, and consistent development environments
  2. Parallel processing/high volume for data processing
  3. Simplified model deployment and model versioning


Sample Architecture Diagram


Architecture 2: Production MLOps

MLOps for organizations aiming to deploy more than one model into production

Use Case

As an organization you’ve been able to deploy a model but are unable to monitor the model performance and improve the model. You may also be thinking about how best to A/B or rollout new versions of your model. Furthermore, you have started collecting labels to improve model performance but lack automation when it comes to re-triggering training pipelines. This tier is great for the single-product, multi-model organization

Challenges this ML architecture addresses:

  • All challenges addressed in the "Minimum Viable MLOps" architecture
  • Model performance monitoring (tracking predictions being made + labels) on inference
  • Model performance monitoring during training
  • Training pipeline automation based on data and time triggers


Sample Architecture Diagram

Architecture 3: Enterprise MLOps

MLOps for teams that are leveraging machine learning across multiple products, teams, or subdivisions of the organization, who need a consistent way to unify and manage operations, data, and model lifecycles

Use Case:

As an organization you are now not only running multiple features powered by ML but you also have distinct product teams each having to manage and run their own machine learning systems. Some models share data between them and you are finding either data inconsistency across models or way too much data duplication. Each team creates their own “same” data features to train their models. This tier is for the multi-product, multi-model organization 

Challenges this architecture will address:

  • All challenges addressed in the Minimum Viable and Production MLOps Architecture’s 
  • More comprehensive model registry (e.g model cards, granular model version details)
  • Uniform feature access layer
  • Orchestration for all machine learning pipelines for tracking in one place 


Sample Architecture Diagram


We view building a durable MLOps practice as a prerequisite to transforming your product or service offering using machine learning. Instituting the right MLOps architecture for your stage in the ML journey will ensure your engineering and data science team have a suitable environment to enable model deployment, A/B testing, or rollouts of new versions of your model. Whether you are at Minimum Viable stage, Production, or building out Enterprise ML Architecture, the infrastructure you choose to invest in will contribute greatly to your likelihood of success. 

If you are serious about ML and what it can do for your business, you must be just as serious about the infrastructure that makes this all possible. These infrastructure environments can decide the fate of our ML projects and the satisfaction of your team. Getting infrastructure right at each stage unlocks the possibility of graduating to the next stage. The ultimate goal is incorporate ML into your product or service to give you a competitive edge in your business.


Questions?

If you'd like to dive deeper on this ML architecture approach please reach out via our contact page.



Article By

Saif Abid

I have a very deep interest and experience working with Golang, Distributed Systems, Data Engineering and ML. When I’m not working with teams to figure out what to build next or how to build it, I'm heads down getting services built and shipped to customers using best in class tech for the problems at hand.