We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Availability and Disaster Recovery for NVIDIA Omniverse Enterprise Nucleus
NVIDIA Omniverse is a revolutionary platform, which allows creators and organizations to collaborate in real-time on 3D designs and simulations. It offers a wide range of integrations and tools to enable teams to work together, and to bring their ideas to life.
One of the fundamental aspects of NVIDIA Omniverse is the ability to author content in your traditional application. Omniverse has connections to popular CAD tools like Autodesk Revit, PTC CREO, as well as content creation tools such as Autodesk 3ds Max, Autodesk Maya, and Blender, see a full list here:
NVIDIA Omniverse Nucleus is the database and collaboration engine of the Omniverse platform. With Omniverse Nucleus, teams can have multiple live users connected using different applications at once. Nucleus enables efficient live synchronization between NVIDIA Omniverse applications. Changes to Universal Scene Description (USD) files, the core Omniverse data format, are transmitted in real-time between connected Omniverse clients.
As companies look to leverage NVIDIA Omniverse to drive their digital innovation, it is important to consider where, and how, the Nucleus server is configured. With many teams and companies spread throughout a country, or globally, it is important to understand why it’s ideal to deploy Nucleus in the cloud, and how to ensure quick recovery in the event of a server failure.
Deploying Omniverse Enterprise Nucleus on Amazon Web Services with SoftServe
As a member of the NVIDIA Service Delivery Partner – Professional Services (SDP-PS) program,
The SoftServe professional services team works with customers to set up Nucleus cloud deployments by automating and provisioning Amazon Web Services resources, such as
Solution Overview
- End users of the Omniverse tools are supported by on-premises graphics workstations. These workstations have high-end NVIDIA GPUs, the Omniverse clients, and additional Digital Content Creation tools connected using Nucleus Connectors.
- Depending on network security requirements, the Amazon Web Services component of this hybrid deployment can be privately connected to the on-premises network via VPN connection or
Amazon Web Services Direct Connect connection. A managed private certificate authority can be deployed withAmazon Route 53 for private DNS resolution.Amazon Virtual Private Cloud (Amazon VPC) private link endpoints maintain private communication between Amazon EC2 instances and services such asAmazon Web Services Systems Manager Agent (SSM Agent) , Amazon S3, andAmazon CloudWatch . - An
Application Load Balancer (ALB) is deployed in public subnets to redirect client requests from HTTP to HTTPS and then to the NGINX reverse proxy servers. The ALB also balances traffic load across the reverse proxy servers if multiple have been provisioned. - The reverse proxy is a NGINX server deployed in a highly available multi-AZ auto scaling group. The reverse proxy routes requests based on paths to the specific Nucleus ports.
- The Nucleus server is comprised of Docker containers orchestrated by a Docker Compose stack provided by NVIDIA. Nucleus data is stored on
Amazon Elastic Block Store (Amazon EBS) volumes. - When deployed, Amazon Web Services Systems Manager Run Commands pull the Nucleus Docker container images from the NVIDIA Container Registry and configure the Nucleus instance on Amazon EC2.
- Access to the NVIDIA Container Registry is required for Docker to pull the appropriate images.
- Auto scaling Lifecycle hooks, backed by
Amazon Web Services Lambda , support runtime configuration of the NGINX proxy instances when they scale up and when the instances terminate. - Triggered by the Nucleus ASG On Terminate Lifecycle Hook, the Nucleus failover procedure uses
Amazon Web Services Step Functions to pull the Nucleus backup data from Amazon S3 and reconfigure the newly launched EC2 instance. During this time, it is expected to have a downtime of a few minutes while the new EC2 instance is launched and configured. - Triggered periodically by
Amazon EventBridge , the Nucleus backup procedure uses Amazon Web Services Step Functions and the NVIDIA nucleus-tools to perform incremental backups of the Nucleus data to Amazon S3. - CloudWatch aggregates logs from the Amazon EC2 instances and facilitates metric monitoring and alarms. The Nucleus stack also exposes metrics about its load characteristics (such as number of requests per user, per request type, etc.). These metrics are exposed to be consumable by Prometheus.
High Availability
Production teams expect reliable and consistent access to the data stored in Nucleus. To address this expectation, features of high availability have been implemented in this solution.
Using an ALB, Route 53 requests are sent to a single DNS host name and dynamically routed across multiple Availability Zones (AZs). To ensure encrypted connections, an
NGINX reverse proxy servers route traffic to specific ports on the Nucleus server. An
The maximum number of reverse proxy instances, the scaling mechanism, and the number of AZs to scale across are configurable to ensure high availability for each use case.
Backup and Restore
The Omniverse Nucleus on Amazon Web Services solution implements backup procedures at different levels:
- Snapshots of Amazon EBS volumes
- Copy and transfer of the Nucleus data to an Amazon S3 Bucket
These backup features are configurable and automated by using an Amazon Web Services Step Functions state machine, which is triggered by a Lambda function on a configurable schedule. Using the NVIDIA nucleus-tools, incremental copies of the Nucleus data are synchronized with the Amazon S3 Bucket. Since the backup happens incrementally, it is best to allow frequent backups reducing the file transfer size and the point of recovery time.
Disaster Recovery
When managing centralized datastores such as the Omniverse Nucleus collaboration engine for digital assets, companies need to protect the continuity of the business and avoid work disruptions.
To maintain a Recovery Time Objective (RTO) of a few minutes, this solution implements incremental Nucleus data backups and automated configuration procedures. This includes periodic, incremental backups of the Nucleus data to an Amazon S3 Bucket but also serverless processes using Amazon Web Services Lambda, Auto Scaling Groups, and Amazon Web Services Step Functions for automatically launching and reconfiguring Nucleus instances running on Amazon EC2.
When an instance failure is detected by the Nucleus Auto Scaling Group, a new instance is automatically launched and the failover Step Function procedure starts. The Step Function procedure pulls the Nucleus backup from S3 and, with Amazon Web Services Systems Manager and the NVIDIA nucleus-tools, uploads the data into the new Nucleus instance.
This approach allows customers to recover quickly from unexpected incidents that affect the availability of the Nucleus server. The recovery process is configurable and works with a health check and Lambda functions to implement the failover process.
Infrastructure as Code
One of the key objectives of building the Omniverse Nucleus on Amazon Web Services reference architecture is to allow customers to provision the Nucleus server in an automated fashion by using
This solution also deploys an
Conclusion
With Amazon Web Services, customers can connect distributed users all over the globe to NVIDIA Omniverse Enterprise Nucleus. With the breadth and depth of Amazon Web Services, high availability and disaster recovery techniques can be implemented for Nucleus deployed on Amazon Web Services. This includes load balancing, auto scaling, backup, restore, of data in Nucleus. All of this ensures teams can collaborate in real-time with reliable access to their data.
Working alongside SoftServe professional services teams, customers can quickly deploy Nucleus in their Amazon Web Services accounts and customize the solution for their business needs.
For a technical deep dive, please review this open-source solution from Amazon Web Services and SoftServe:
SoftServe – Amazon Web Services Premier Partner
As an Amazon Web Services Premier Tier Services Partner, SoftServe consistently helps customers to implement repeatable solutions in the Amazon Web Services cloud through deep industry experience, innovation, and advanced technologies.
SoftServe can help you to transform your 3D workflows and enable your teams to achieve a new level of collaboration in 3D production quality with NVIDIA Omniverse Enterprise.
For more information about SoftServe and NVIDIA Omniverse Enterprise, please go to our website:
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.