Revisit Amazon Web Services re:Invent 2024’s biggest moments and watch keynotes and innovation talks on demand
General
Q: What is Amazon SageMaker?
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
Q: In which regions is Amazon SageMaker available?
For a list of the supported Amazon SageMaker Amazon Web Services regions, please visit the Amazon Web Services Region Table for all Amazon Web Services global infrastructure. Also for more information, see Regions and Endpoints in the Amazon Web Services General Reference.
Q: What is the service availability of Amazon SageMaker?
Amazon SageMaker is designed for high availability. There are no maintenance windows or scheduled downtimes. SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each Amazon Web Services region to provide fault tolerance in the event of a server failure or Availability Zone outage.
Q: What security measures does Amazon SageMaker have?
Amazon SageMaker ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest. Requests to the SageMaker API and console are made over a secure (SSL) connection. You pass Amazon Identity and Access Management roles to SageMaker to provide permissions to access resources on your behalf for training and deployment. You can use encrypted S3 buckets for model artifacts and data, as well as pass a KMS key to SageMaker notebooks, training jobs, and endpoints, to encrypt the attached ML storage volume.
Q: How does Amazon SageMaker secure my code?
Amazon SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.
Q: How am I charged for Amazon SageMaker?
You pay for ML compute, storage, and data processing resources you use for hosting the notebook, training the model, performing predictions, and logging the outputs. Amazon SageMaker allows you to select the number and type of instance used for the hosted notebook, training, and model hosting. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments. See the Amazon SageMaker pricing page for details.
Q: What if I have my own notebook, training, or hosting environment?
Amazon SageMaker provides a full end-to-end workflow, but you can continue to use your existing tools with SageMaker. You can easily transfer the results of each stage in and out of SageMaker as your business requirements dictate.
Q. What is Amazon SageMaker Studio?
Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps. SageMaker Studio gives you complete access, control, and visibility into each step required to build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place, making you much more productive. All ML development activities including notebooks, experiment management, automatic model creation, debugging and profiling, and model drift detection can be performed within the unified SageMaker Studio visual interface.
Q. What is Amazon SageMaker Autopilot?
Amazon SageMaker Autopilot is the industry’s first automated machine learning capability that gives you complete control and visibility into your ML models. SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks. The result is the best performing model that you can deploy at a fraction of the time normally required to train the model. You get full visibility into how the model was created and what’s in it and SageMaker Autopilot integrates with Amazon SageMaker Studio. You can explore up to 50 different models generated by SageMaker Autopilot inside SageMaker Studio so it’s easy to pick the best model for your use case. SageMaker Autopilot can be used by people without machine learning experience to easily produce a model or it can be used by experienced developers to quickly develop a baseline model on which teams can further iterate.
Q: How is Amazon SageMaker Autopilot different from vertical AI services like Amazon Personalize and Amazon Forecast?
While Amazon Personalize and Amazon Forecast specifically target at personalized recommendation and forecasting use cases, Amazon SageMaker Autopilot is a generic automatic machine learning solution for classification and regression problems, such as fraud detection, churn analysis, and targeted marketing. Personalize and Forecast focus on simplifying end to end experience by offering training and model hosting in a bundle. You can train models using Amazon SageMaker Autopilot and get full access to the models as well as the pipelines that generated the models. They can then deploy the models to the hosting environment of their choice, or further iterate to improve model quality.
Q: What built-in algorithms are supported in Amazon SageMaker Autopilot?
Amazon SageMaker Autopilot supports 2 built-in algorithms at launch: XGBoost and Linear Learner.
Q: Does Amazon SageMaker Autopilot support distributed training?
Yes. All Amazon SageMaker Autopilot built-in algorithms support distributed training out of the box.
Q: Can I stop an Amazon SageMaker Autopilot job manually?
Yes. You can stop a job at any time. When an Amazon SageMaker Autopilot job is stopped, all ongoing trials will be stopped and no new trial will be started.
Build Models
Q: What types of notebooks are supported?
Currently, Jupyter notebooks are supported.
Q. How do Amazon SageMaker Notebooks work?
Amazon SageMaker Notebooks provide one-click Jupyter notebooks that you can start working within seconds. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. SageMaker also enables one-click sharing of notebooks. All code dependencies are automatically captured, so you can easily collaborate with others. They’ll get the exact same notebook, saved in the same place.
With SageMaker Notebooks you can sign in with your corporate credentials using SSO and start working with notebooks within seconds. Sharing notebooks within and across teams is easy, since the dependencies needed to run a notebook are automatically tracked in environments that are encapsulated with the notebook as it is shared.
Q. How do Amazon SageMaker Notebooks work with other Amazon Web Services services?
Amazon SageMaker Notebooks give you access to all SageMaker features, such as distributed training, batch transform, hosting, and experiment management. You can access other services such as datasets in Amazon S3, Amazon Redshift, Amazon Glue, or Amazon EMR from SageMaker Notebooks.
Q. What is Amazon SageMaker Ground Truth?
Amazon SageMaker Ground Truth provides automated data labeling using machine learning. SageMaker Ground Truth will first select a random sample of data and send it to Amazon Mechanical Turk to be labeled. The results are then used to train a labeling model that attempts to label a new sample of raw data automatically. The labels are committed when the model can label the data with a confidence score that meets or exceeds a threshold you set. Where the confidence score falls below your threshold, the data is sent to human labelers. Some of the data labeled by humans is used to generate a new training dataset for the labeling model, and the model is automatically retrained to improve its accuracy. This process repeats with each sample of raw data to be labeled. The labeling model becomes more capable of automatically labeling raw data with each iteration, and less data is routed to humans.
Train Models
Q. What is Amazon SageMaker Experiments?
Amazon SageMaker Experiments helps you organize and track iterations to machine learning models. SageMaker Experiments helps you manage iterations by automatically capturing the input parameters, configurations, and results, and storing them as ‘experiments’. You can work within the visual interface of SageMaker Studio, where you can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually.
Q. What is Amazon SageMaker Debugger?
Amazon SageMaker Debugger makes the training process more transparent by automatically capturing real-time metrics during training such as training and validation, confusion matrices, and learning gradients to help improve model accuracy.
The metrics from SageMaker Debugger can be visualized in Amazon SageMaker Studio for easy understanding. SageMaker Debugger can also generate warnings and remediation advice when common training problems are detected. With SageMaker Debugger, you can interpret how a model is working, representing an early step towards model explainability.
Q: What is Managed Spot Training?
Managed Spot Training with Amazon SageMaker lets you train your machine learning models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.
Q: How do I use Managed Spot Training?
You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training job, both while they are running and while they are waiting for capacity.
Q: When should I use Managed Spot Training?
Managed Spot Training is ideal when you have flexibility with your training runs and when you want to minimize the cost of your training jobs. With Managed Spot Training, you can reduce the cost of training your machine learning models by up to 90%.
Q: How does Manage Spot Training work?
Managed Spot Training uses Amazon EC2 Spot instances for training, and these instances can be pre-empted when Amazon Web Services needs capacity. As a result, Managed Spot Training jobs can run in small increments as and when capacity becomes available. The training jobs need not be restarted from scratch when there is an interruption as Amazon SageMaker can resume the training jobs using the latest model checkpoint. The built-in frameworks and the built-in computer vision algorithms with SageMaker enable periodic checkpoints, and you can enable checkpoints with custom models.
Q: Do I need to periodically checkpoint with Managed Spot Training?
We recommend periodic checkpoints as a general best practice for long running training jobs. This prevents your Managed Spot Training jobs from restarting if capacity is pre-empted. When you enable checkpoints, Amazon SageMaker resumes your Managed Spot Training jobs from the last checkpoint.
Q: How do you calculate the cost savings with Managed Spot Training jobs?
Once a Managed Spot Training job is completed, you can see the savings in the Amazon Web Services management console and also calculate the cost savings as the percentage difference between the duration for which the training job ran and the duration for which you were billed.
Regardless of how many times your Managed Spot Training jobs are interrupted, you are charged only once for the duration for which the data was downloaded.
Q: Which instances can I use with Managed Spot Training?
Managed Spot Training can be used with all instances supported in Amazon SageMaker.
Q: Which Amazon Web Services regions are supported with Managed Spot Training?
Managed Spot Training is supported on all Amazon Web Services regions where Amazon SageMaker is currently available.
Q: Are there limits to the size of the dataset I can use for training?
There are no fixed limits to the size of the dataset you can use for training models with Amazon SageMaker.
Q: What data sources can I easily pull into Amazon SageMaker?
You can specify the Amazon S3 location of your training data as part of creating a training job.
Q: What algorithms does Amazon SageMaker use to generate models?
Amazon SageMaker includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, sequence2sequence, time series forecasting, word2vec, and image classification. SageMaker also provides optimized Apache MXNet, Tensorflow, Chainer, PyTorch, Gluon, Keras, Horovod, Scikit-learn, and Deep Graph Library containers. In addition, Amazon SageMaker supports your custom training algorithms provided through a Docker image adhering to the documented specification.
Q: What is Automatic Model Tuning?
Most machine learning algorithms expose a variety of parameters that control how the underlying algorithm operates. Those parameters are generally referred to as hyperparameters and their values affect the quality of the trained models. Automatic model tuning is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.
Q: What models can be tuned with Automatic Model Tuning?
You can run automatic model tuning in Amazon SageMaker on top of any algorithm as long as it’s scientifically feasible, including built-in SageMaker algorithms, deep neural networks, or arbitrary algorithms you bring to SageMaker in the form of Docker images.
Q: Can I use Automatic Model Tuning outside of Amazon SageMaker?
Not at this time. The best model tuning performance and experience is within Amazon SageMaker.
Q: What is the underlying tuning algorithm?
Currently, our algorithm for tuning hyperparameters is a customized implementation of Bayesian Optimization. It aims to optimize a customer specified objective metric throughout the tuning process. Specifically, it checks the object metric of completed training jobs, and leverages the knowledge to infer the hyperparameter combination for the next training job.
Q: Will you recommend specific hyperparameters for tuning?
No. How certain hyperparameters impact the model performance depends on various factors and it is hard to definitively say one hyperparameter is more important than the others and thus needs to be tuned. For built-in algorithms within Amazon SageMaker, we do call out whether or not a hyperparameter is tunable.
Q: How long does a hyperparameter tuning job take?
The length of time for a hyperparameter tuning job depends on multiple factors including the size of the data, the underlying algorithm, and the values of the hyperparameters. Additionally, customers can choose the number of simultaneous training jobs and total number of training jobs. All these choices affect how long a hyperparameter tuning job can last.
Q: Can I optimize multiple objectives simultaneously like a model to be both fast and accurate?
Not at this time. Right now, you need to specify a single objective metric to optimize or change your algorithm code to emit a new metric, which is a weighted average between two or more useful metrics, and have the tuning process optimize towards that objective metric.
Q: How much does Automatic Model Tuning cost?
There is no charge for a hyperparameter tuning job itself. You will be charged by the training jobs that are launched by the hyperparameter tuning job, based on model training pricing.
Q: How do I decide to use Amazon SageMaker Autopilot or Automatic Model Tuning?
Amazon SageMaker Autopilot automates everything in a typical machine learning workflow, including feature preprocessing, algorithm selection, and hyperparameter tuning, while specifically focusing on classification and regression use cases. Automatic Model Tuning, on the other hand, is designed to tune any model, no matter it is based on built-in algorithms, deep learning frameworks, or custom containers. In exchange for the flexibility, you have to manually pick the specific algorithm, determine the hyperparameters to tune, and corresponding search ranges.
Q: What is reinforcement learning?
Reinforcement learning is a machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.
Q: Can I train reinforcement learning models in Amazon SageMaker?
Yes, you can train reinforcement learning models in Amazon SageMaker in addition to supervised and unsupervised learning models.
Q: How is reinforcement learning different from supervised learning?
Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where the feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses a delayed feedback where reward signals are optimized to ensure a long-term goal through a sequence of actions.
Q: When should I use reinforcement learning?
While the goal of supervised learning techniques is to find the right answer based on the patterns in the training data and the goal of unsupervised learning techniques is to find similarities and differences between data points. In contrast, the goal of reinforcement learning techniques is to learn how to achieve a desired outcome even when it is not clear how to accomplish that outcome. As a result, RL is more suited to enabling intelligent applications where an agent can make autonomous decisions such as robotics, autonomous vehicles, HVAC, industrial control, and more.
Q: What type of environments can I use for training reinforcement learning models?
Amazon SageMaker RL supports a number of different environments for training reinforcement learning models. You can use Amazon Web Services services such as Amazon RoboMaker, open source environments or custom environments developed using Open AI Gym interfaces, or commercial simulation environments such as MATLAB and SimuLink.
Q: Do I need to write my own RL agent algorithms to train reinforcement learning models?
No, Amazon SageMaker RL includes RL toolkits such as Coach and Ray RLLib that offer implementations of RL agent algorithms such as DQN, PPO, A3C, and many more.
Q: Can I bring my own RL libraries and algorithm implementation and run in Amazon SageMaker RL?
Yes, you can bring your own RL libraries and algorithm implementations in Docker Containers and run those in Amazon SageMaker RL.
Q: Can I do distributed rollouts using Amazon SageMaker RL?
Yes. You can even select a heterogeneous cluster where the training can run on a GPU instance and the simulations can run on multiple CPU instances.
Deploy Models
Q. What is Amazon SageMaker Model Monitor?
Amazon SageMaker Model Monitor allows developers to detect and remediate concept drift. SageMaker Model Monitor automatically detects concept drift in deployed models and provides detailed alerts that help identify the source of the problem. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in SageMaker Studio. From inside SageMaker Studio you can configure data to be collected, how to view it, and when to receive alerts.
Q: Can I access the infrastructure that Amazon SageMaker runs on?
No. Amazon SageMaker operates the compute infrastructure on your behalf, allowing it to perform health checks, apply security patches, and do other routine maintenance. You can also deploy the model artifacts from training with custom inference code in your own hosting environment.
Q: How do I scale the size and performance of an Amazon SageMaker model once in production?
Amazon SageMaker hosting automatically scales to the performance needed for your application using Application Auto Scaling. In addition, you can manually change the instance number and type without incurring downtime through modifying the endpoint configuration.
Q: How do I monitor my Amazon SageMaker production environment?
Amazon SageMaker emits performance metrics to Amazon CloudWatch Metrics so you can track metrics, set alarms, and automatically react to changes in production traffic. In addition, Amazon SageMaker writes logs to Amazon Cloudwatch Logs to let you monitor and troubleshoot your production environment.
Q: What kinds of models can be hosted with Amazon SageMaker?
Amazon SageMaker can host any model that adheres to the documented specification for inference Docker images. This includes models created from Amazon SageMaker model artifacts and inference code.
Q: How many concurrent real-time API requests does Amazon SageMaker support?
Amazon SageMaker is designed to scale to a large number of transactions per second. The precise number varies based on the deployed model and the number and type of instances to which the model is deployed.
Q: What is Batch Transform?
Batch Transform enables you to run predictions on large or small batch data. There is no need to break down the data set into multiple chunks or managing real-time endpoints. With a simple API, you can request predictions for a large number of data records and transform the data quickly and easily
Q: What is Amazon SageMaker Neo?
Amazon SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge. SageMaker Neo automatically optimizes models built with popular deep learning frameworks that can be used to deploy on multiple hardware platforms. Optimized models run up to two times faster and consume less than a tenth of the resources of typical machine learning models.
Q: How do I get started with Amazon SageMaker Neo?
To get started with Amazon SageMaker Neo, you log into the Amazon SageMaker console, choose a trained model, follow the example to compile models, and deploy the resulting model onto your target hardware platform.
Q: What are the major components of Amazon SageMaker Neo?
Amazon SageMaker Neo contains two major components – a compiler and a runtime. First, the Neo compiler reads models exported by different frameworks. It then converts the framework-specific functions and operations into a framework-agnostic intermediate representation. Next, it performs a series of optimizations. Then, the compiler generates binary code for the optimized operations and writes them to a shared object library. The compiler also saves the model definition and parameters into separate files. During execution, the Neo runtime loads the artifacts generated by the compiler -- model definition, parameters, and the shared object library to run the model.
Q: Do I need to use Amazon SageMaker to train my model in order to use Amazon SageMaker Neo to convert the model?
No. You can train models elsewhere and use Neo to optimize them for Amazon SageMaker ML instances or Amazon IoT Greengrass supported devices.
Q: Which models does Amazon SageMaker Neo support?
Currently, Amazon SageMaker Neo supports the most popular deep learning models that power computer vision applications and the most popular decision tree models used in Amazon SageMaker today. Neo optimizes the performance of AlexNet, ResNet, VGG, Inception, MobileNet, SqueezeNet, and DenseNet models trained in MXNet and TensorFlow, and classification and random cut forest models trained in XGBoost.
Q: Which platforms does Amazon SageMaker Neo support?
Currently, Neo supports SageMaker ML.C5, ML.C4, ML.M5, ML.M4, ML.P3, and ML.P2 instances and Raspberry Pi, and Jetson TX1 and TX2 devices, and Greengrass devices-based Intel® Atom and Intel® Xeon CPUs, ARM Cortex-A CPUs, and Nvidia Maxwell and Pascal GPUs.
Q: Do I need to use a specific version of a framework that is supported on the target hardware?
No. Developers can run models using the Amazon SageMaker Neo container without dependencies on the framework.
Q: How much does it cost to use Amazon SageMaker Neo?
You pay for the use of the Amazon SageMaker ML instance that runs inference using Amazon SageMaker Neo.
Q: In which Amazon Web Services regions is Amazon SageMaker Neo available?
To see a list of support regions, view the Amazon Web Services region table.