Revisit Amazon Web Services re:Invent 2024’s biggest moments and watch keynotes and innovation talks on demand
Train with Amazon SageMaker
Amazon SageMaker makes it easy to train machine learning (ML) models by providing everything you need to tune and debug models and execute training experiments.
Features
Experiment management and tracking
Machine learning is an iterative process based on continuous experimentation, for instance, trying new learning algorithms or tweaking algorithm’s hyperparameters, all the while observing the impact of such incremental changes on model performance and accuracy. Over time this explosion of data makes it harder to track the best performing models, observations and lessons learned during the course of experimentation, and also the exact ingredients and recipe that went into creating those models in the first place.
Amazon SageMaker Experiments helps you track, evaluate, and organize training experiments in an easy and scalable manner. SageMaker Experiments comes within the Amazon SageMaker Studio as well as a Python SDK with deep Jupyter integrations.
Analyze and debug with complete insights
It is challenging to get complete insight and visibility into the ML training process. There is no easy way to ensure that your model is progressively learning the correct values for its parameters. For example, when training a computer vision model using a convolutional neural network, you might have to run the training job for hours. During this time, there is no visibility into how the different ML model parameters are affecting the model training and whether the training process is yielding desired results.
Amazon SageMaker Debugger provides full visibility into the training process. SageMaker Debugger makes the inspection easy by providing a visual interface for developers to analyze the debug data, and also by providing visual indicators about potential anomalies in the training process.
One-click training
Training models is easy with Amazon SageMaker. When you’re ready to train in SageMaker, simply specify the location of your data in Amazon S3, and indicate the type and quantity of SageMaker ML instances you need, and get started with a single click. SageMaker sets up a distributed compute cluster, performs the training, outputs the result to Amazon S3, and tears down the cluster when complete.
Automatic Model Tuning
Amazon SageMaker can automatically tune your model by adjusting thousands of different combinations of algorithm parameters to arrive at the most accurate predictions the model is capable of producing.
Managed spot training
You can reduce the costs of training your machine learning models by up to 90% using Managed Spot Training. Managed spot training uses Amazon EC2 Spot instances, which is spare EC2 capacity, so your training jobs run at much lower costs compared to Amazon EC2 On-Demand instances. Amazon SageMaker manages the training jobs so they are run as and when compute capacity becomes available. As a result, you don’t have to poll continuously for capacity and managed spot training eliminates the need to build additional tooling to manage interruptions. Managed spot training works with automatic model tuning, the built-in algorithms and frameworks that come with Amazon SageMaker, and custom algorithms.