Revisit Amazon Web Services re:Invent 2024’s biggest moments and watch keynotes and innovation talks on demand
Amazon SageMaker Neo
Train models once, run anywhere with up to 2x performance improvement
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
Developers spend a lot of time and effort to deliver accurate machine learning models that can make fast, low-latency predictions in real-time. This is particularly important for edge devices where memory and processing power tend to be highly constrained, but latency is very important. For example, sensors in autonomous vehicles typically need to process data in a thousandth of a second to be useful, so a round trip to the cloud and back isn’t possible. Also, there is a wide array of different hardware platforms and processor architectures for edge devices. To achieve high performance, developers need to spend weeks or months hand-tuning their model for each one. Also, the complex tuning process means that models are rarely updated after they are deployed to the edge. Developers miss out on the opportunity to retrain and improve models based on the data the edge devices collect.
Amazon SageMaker Neo automatically optimizes machine learning models to perform at up to twice the speed with no loss in accuracy. You start with a machine learning model built using MXNet, TensorFlow, PyTorch, or XGBoost and trained using Amazon SageMaker. Then you choose your target hardware platform from Intel, NVIDIA, or ARM. With a single click, SageMaker Neo will then compile the trained model into an executable. The compiler uses a neural network to discover and apply all of the specific performance optimizations that will make your model run most efficiently on the target hardware platform. The model can then be deployed to start making predictions in the cloud or at the edge. Local compute and ML inference capabilities can be brought to the edge with Amazon IoT Greengrass. To help make edge deployments easy, Amazon IoT Greengrass supports Neo-optimized models so that you can deploy your models directly to the edge with over the air updates.
Neo uses Apache TVM and other partner-provided compilers and kernel libraries. Neo is available as open source code as the Neo-AI project under the Apache Software License, enabling developers to customize the software for different devices and applications.
Benefits
Run ML models with up to 2x better performance
Amazon SageMaker Neo automatically optimizes TensorFlow, MXNet, PyTorch, and XGBoost machine learning models to perform at up to twice the speed with no loss in accuracy. Using deep learning, SageMaker Neo discovers and applies code optimizations for your specific model and the hardware you intend to deploy the model on. You get the performance benefits of manual tuning without the weeks of effort.
Reduce framework size by 10x
Amazon SageMaker Neo reduces the set of software operations in your model’s framework to only those required for it to make predictions. Typically, this provides a 10x reduction in the amount of memory required by the framework. The model and framework are then compiled into a single executable that can be deployed in production to make fast, low-latency predictions.
Run the same ML model on multiple hardware platforms
Amazon SageMaker Neo allows you to train your model once and run it virtually anywhere with a single executable. Neo understands how to optimize your model for Intel, NVIDIA, ARM, Cadence, Qualcomm, and Xilinx processor architectures automatically, which makes preparing your model for multiple platforms as easy as a few clicks in the Amazon SageMaker console.
How it works
Key Features
Use the deep learning framework you prefer
Amazon SageMaker Neo converts the framework-specific functions and operations for TensorFlow, MXNet, and PyTorch into a single compiled executable that can be run anywhere. Neo compiles and generates the required software code automatically.
Easy and Efficient Software Operations
Amazon SageMaker Neo outputs an executable that is deployed on cloud instances and edge devices. The Neo runtime reduces the usage of resources such as storage on the deployment platforms by 10x and eliminates the dependence of frameworks. As an example, the Neo runtime occupies 2.5MB of storage compared to framework dependent deployments that can occupy up to 1GB of storage.
Open Source Software
Neo is available as open source code as the Neo-AI project under the Apache Software License. This enables developers and hardware vendors to customize applications and hardware platforms, and take advantage of Neo’s optimization and reduced resource usage techniques.