Posted On: Aug 11, 2020
Amazon SageMaker Operators for Kubernetes make it easier for developers and data scientists using Kubernetes to train, tune, and deploy machine learning (ML) models in Amazon SageMaker.
Customers use Kubernetes, a general purpose container orchestration system, to setup repeatable pipelines and maintain greater control and portability over their workloads. But when running ML workloads in Kubernetes, customers also have to manage and optimize the underlying ML infrastructure, ensure high availability and reliability, provide ML tools to make data scientists more productive, and comply with appropriate security and regulatory requirements. With Amazon SageMaker Operators for Kubernetes, customers can invoke SageMaker using the Kubernetes API or Kubernetes tools such as kubectl to create and interact with their ML jobs in SageMaker. This gives Kubernetes customers the portability and standardization benefits of Kubernetes and EKS, along with the benefits of fully managed ML services with Amazon SageMaker.
Customers can use Amazon SageMaker Operators for model training, model hyperparameter optimizations, real-time inference, and batch inference. For model training, Kubernetes customers can now leverage all the benefits of fully managed ML model training in SageMaker, including Managed Spot Training to save up to 90% in cost, and distributed training to reduce training time by scaling to multiple GPU nodes. Compute resources are only provisioned when requested, scaled as needed, and shut down automatically when jobs complete, ensuring near 100% utilization. For hyperparameter tuning, customers can use SageMaker’s Automatic Model Tuning, saving data scientists days or even weeks of time improving model accuracy. Customers can also use Spot instance for Automatic Model Tuning. For inference, customers can use SageMaker Operators to deploy trained models in SageMaker to fully managed auto-scaling clusters, spread across multiple availability zones to deliver high performance and availability for real-time or batch prediction.