This post features a guest interview with Diego M. Oppenheimer, CEO at Algorithmia
Over the past few years, machine learning has grown in adoption within the enterprise. More organizations are realizing the importance of machine learning to deliver results for their businesses. Increasingly, data scientists and machine learning engineers are storing models and even canonical data sets on GitHub so they can productize their work.
To explore how businesses are making machine learning enterprise-ready, our VP of Business Development and Partner Engineering, Dana Lawson, sat down with Diego M. Oppenheimer, CEO at Algorithmia, to talk about the current state of MLOps (machine learning operations) and what companies should consider as part of their own machine learning workflows.
Let’s dive in!
Machine learning operations (MLOps) is the discipline of delivering machine learning models through repeatable and efficient workflows. In short, it’s what enables businesses to scale their production capacities to the point of delivering significant results from machine learning. And it’s going to be an essential component to enterprises industrializing their AI efforts in the future.
Besides common architectural challenges (such as hardware orchestration, container management, load balancing, and inference API management), organizations also struggle with security, governance, and versioning of ML artifacts—this is a challenge that must be solved to ensure that machine learning can be widely productized in the future. This is where MLOps comes in, and why businesses need it to unlock the value in their AI.
What are most machine learning teams doing today, and what are their biggest challenges in operationalizing their work?
Today, most machine learning teams in the enterprise are working with a disparate toolchain. They do this because there are different tools optimized for each part of the ML lifecycle—there’s tools for storage data sets, tools for large models, then also for versioning notebooks, evaluating and testing models, and then deploying to production.
In addition to maintaining and productizing their own models, ML teams need a way to continuously collaborate with other teams. Many ML teams are incorporating DevOps principles into their work, but others still work in their own research silos, apart from product or go-to-market teams that can help them put all the pieces together in a secure, compliant, and reliable way to reach business goals.
In our 2021 enterprise trends in machine learning report, we found that 83% of organizations have increased their budgets for AI and ML year-on-year, and the average number of data scientists employed has increased by 76% over the same period. But those same organizations are struggling to manage and scale those efforts. Our report found that the time required to deploy an ML model has actually increased—implying that many organizations are manually scaling their ML efforts rather than focusing on the underlying operational efficiencies that enable businesses to achieve greater results through ML. In other words, they’re taking on more technical debt instead of fixing a broken ML lifecycle.
Related to the challenge of technical debt, we’re also seeing that organizations are struggling with a variety of operational issues, especially when it comes to governance. In our report, 56% of survey respondents indicated that they struggle with IT governance, security, and auditability requirements—and 67% reported needing to comply with multiple regulations. This was the top issue reported by respondents, but a variety of other issues spanned across the ML lifecycle. For example, 49% reported integration and compatibility issues surrounding ML technologies, programming languages, and frameworks, making that the second-most-common challenge.
To solve the problem of disparate ML tooling, there needs to be a clearly defined, canonical stack for AI and machine learning. Algorithmia has recently joined the AI Infrastructure Alliance, a group of like-minded companies that are coming together to define this canonical stack. Our hope is that open standards will help accelerate adoption and innovation with machine learning, making it truly portable and scalable, with endpoints to help organizations manage security, governance, and monitoring, much as we see with open source standards like Linux or Kubernetes today.
Ideally, with those open standards, businesses will be able to train, evaluate, and host models anywhere. Similar to containerized applications today, you’ll be able to deploy models anywhere—whether it’s in the cloud, on the edge, or somewhere in between for fog computing.
At GitHub, we love automation and the interconnected toolchain. What do you see as some of the benefits of embracing MLOps for deployment automation?
Besides freeing you from time-consuming and error-prone manual operations, automated deployments are an important component of model governance. As a policy, some organizations require that deployments involve multiple approvals and are only done through automated processes.
In addition, continuous deployment workflows help with tracing the deployed artifacts to their sources, while also making the deployment process repeatable. So productionizing your models on Algorithmia through rule-based automations is a good practice as part of your organization’s model governance framework.
Algorithmia is machine learning operations (MLOps) software that manages all stages of the production ML lifecycle within existing operational processes so you can put models into production quickly, securely, and cost-effectively. To get started making your machine learning enterprise-ready, check out Algorithmia’s actions in GitHub Marketplace, and visit the walkthrough on the Algorithmia blog.