Posted On: Oct 26, 2021

Amazon EC2 Inf1 instances and Amazon Neuron now support YOLOv5 and ResNext deep learning models as well as the latest open-source Hugging Face Transformers. We have also optimized the Neuron compiler to enhance performance and you can now achieve an out-of-the box 12X higher throughput than comparable GPU-based instances for pre-trained BERT base models. These enhancements enable you to effectively meet your high-performance inference requirements and deploy state of the art deep learning models at low cost.

EC2 Inf1 instances are powered by Amazon Inferentia, a custom chip built by Amazon to accelerate machine learning inference. These instances deliver up to 2.3x higher throughput and up to 70% lower cost per inference than comparable current generation GPU-based Amazon EC2 instances. You can easily train your machine learning models on popular machine learning frameworks such as TensorFlow, PyTorch, and MXNet, and deploy them on EC2 Inf1 instances using the Neuron SDK. As Neuron is integrated with popular machine learning frameworks, you can deploy your existing models to Inf1 instances with minimal code changes. This gives you the freedom to maintain hardware portability and take advantage of the latest technologies without being tied to a vendor-specific solution.

Inf1 instances are available across 23 Amazon Web Services Regions across the globe including the Amazon Web Services China (Beijing) region, operated by Sinnet and Amazon Web Services China (Ningxia) region, operated by NWCD. Our engineering investments, coupled with our scale and our time-tested ability to manage our capacity, allow us to identify and pass on the cost savings to our customer. To help you further scale your deep learning applications in production on Amazon EC2 Inf1 instances, we are announcing a 38% reduction of our On-Demand (OD) prices effective June 1st, 2021. For customers who want to take advantage of Reserved Instances (RI) to further lower their costs, we are reducing our 1 year RI prices by 38% and our 3 year RI prices by 31%. These lower prices would also be effective for customers who use EC2 Inf1 instances via container orchestration services such as Amazon ECS or EKS.

Amazon EC2 Inf1 instances are available in 4 sizes, providing up to 16 Inferentia chips, 96 vCPUs, 192GB of memory, 100 Gbps of networking bandwidth and 19 Gbps of Elastic Block Store (EBS) bandwidth. These instances are purchasable On-Demand, as Reserved Instances, or as Spot instances.

To learn more visit the Amazon EC2 Inf1 instance page.