Amazon DataSync is an online data movement and discovery service that simplifies and accelerates data migrations to Amazon Web Services and helps you move data quickly and securely between on-premises storage, edge locations, other clouds, and Amazon Web Services Storage.
Data Movement
For online data transfers, Amazon DataSync simplifies, automates, and accelerates copying large amounts of data between on-premises storage, edge locations, or other clouds, and Amazon Web Services Storage services, as well as between Amazon Web Services Storage services. DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, Google Cloud Storage, Azure Files, Azure Blob Storage including Azure Data Lake Storage Gen2, Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, and Amazon FSx for Lustre file systems.
Amazon DataSync provides the following features for data movement.
Purpose-Built Network Protocol
Amazon DataSync employs an Amazon-designed transfer protocol—decoupled from the storage protocol—to accelerate data movement. The protocol performs optimizations on how, when, and what data is sent over the network. Network optimizations performed by DataSync include incremental transfers, in-line compression, and sparse file detection, as well as in-line data validation and encryption.
Connections between the local DataSync agent and the in-cloud service components are multi-threaded, maximizing performance over your Wide Area Network (WAN). A single DataSync task is capable fully utilizing 10 Gbps over a network link between your on-premises environment and Amazon Web Services.
Bandwidth Optimization and Control
Transferring hot or cold data should not impede your business. DataSync is equipped with granular controls to optimize bandwidth consumptions. Throttle transfer speeds up to 10 Gbps during off hours and set limits when network availability is needed elsewhere.
Data Transfer Scheduling
DataSync comes with a built-in scheduling mechanism, allowing you to periodically run data transfer tasks to detect and copy changes from your source storage system to the destination. You can schedule your tasks using the Amazon DataSync Console or Amazon Command Line Interface (CLI) without writing scripts to manage repeated transfers. Task scheduling automatically runs tasks on your configured schedule with hourly, daily, or weekly options provided directly in the Amazon Web Services Console.
Data Encryption and Validation
All your data is encrypted in transit between the DataSync agent and the DataSync service using Transport Layer Security (TLS). DataSync supports using default at-rest encryption for Amazon S3 buckets. DataSync also supports encryption of data at rest and in transit for Amazon EFS and Amazon FSx.
DataSync ensures that your data arrives intact. For each transfer, the service performs integrity checks both in transit and at rest. These checks ensure that the data written to your destination matches the data read from your source, validating consistency.
File System Integration and Metadata Preservation
The DataSync agent connects to your existing storage systems using the industry-standard NFS and SMB protocols, to your Hadoop cluster as an HDFS client, or to your self-managed or cloud object storage, using the Amazon S3 application programming interface (API), or to Azure Blob Storage using the Blob API. The agent transfers data rapidly and writes it into your designated Amazon S3 bucket, Amazon EFS file system, Amazon FSx for Windows File Server file system, or Amazon FSx for Lustre file system.
File permissions and metadata are preserved when copying objects and or data between Amazon S3, Amazon EFS, Amazon FSx for Windows File Server, and Amazon FSx for Lustre.
When copying data to Amazon S3, DataSync automatically converts each file to a single S3 object in a 1:1 relationship, and preserves POSIX metadata from NFS shares or HDFS as Amazon S3 object metadata. When you copy objects containing file system metadata back to file formats, the original file metadata (that DataSync copied to S3) is restored.
Integration with Amazon Web Services Infrastructure and Management Services
DataSync works natively with Amazon Web Services security, monitoring, and audit services to simplify data movement and to provide a consistent management experience for your IT, storage, and DevOps teams. In addition to integrations with Amazon S3, Amazon EFS, and Amazon FSx, DataSync supports Amazon Virtual Private Cloud (VPC) endpoints (powered by Amazon PrivateLink) to move files directly into your Amazon VPC. Like other Amazon Web Services services, you can use Amazon Identity and Access Management (IAM) to securely manage DataSync access. Similarly, you can configure an IAM role to control the services accessing your Amazon S3 bucket.
Monitoring and Auditing
DataSync task reports provide JSON-formatted output files that include a summary and detailed reports for all files transferred, skipped, verified, and deleted, enabling you to easily verify and audit the data transfer operations for each task execution. Task reports are generated after the completion of your transfer tasks and they are stored in your Amazon S3 bucket. This allows you to easily use Amazon Web Services such as Amazon Glue and Amazon Athena to automatically catalog and analyze task report output to check the progress of your data transfers across all task executions. Task reports simplify tracking and auditing, enabling you to easily understand common task execution trends or failure patterns, and gain critical insights into your data transfer processes.
With Amazon CloudWatch, you can monitor the status of any DataSync transfers currently in progress and check previous data transfer history. With CloudWatch Metrics, you can see the number of files and amount of data copied. Consult CloudWatch Logs for information about individual files transferred at a given time, as well as the results of DataSync integrity verification. This simplifies monitoring, reporting, and troubleshooting, enabling you to provide timely updates to stakeholders. In addition, CloudWatch Events are triggered as your transfer tasks complete, enabling automation of dependent workflows. For audit purposes, you can also consult Amazon CloudTrail, in addition to task reports, which logs all actions performed by DataSync.
Pay-As-You-Go Pricing
With Amazon DataSync, you pay only for data copied by the service at a flat, per-gigabyte rate. No software licenses, contracts, maintenance fees, development cycles, or hardware are required. This provides a lower total cost of ownership (TCO) compared to manually building, operating, and optimizing your own high-performance scripted transfers, as well as lower total cost than buying and running commercial transfer tools.