2024 Amazon Web Services re:Invent opens on Dec 2 (PST), reserve now to follow frontier technology trends
Amazon S3 has various features you can use to organize and manage your data in ways that support specific use cases, enable cost efficiencies, enforce security, and meet compliance requirements. Data is stored as objects within resources called “buckets”, and a single object can be up to 5 terabytes in size. S3 features include capabilities to append metadata tags to objects, move and store data across the S3 Storage Classes, configure and enforce data access controls, secure data against unauthorized users, run big data analytics, and monitor data at the object and bucket levels. Objects can be accessed through S3 Access Points or directly through the bucket hostname.
- Each object is stored in a bucket and retrieved via a unique, developer-assigned key.
- Objects stored in a specific region never leave the Region unless you transfer them out.
- Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
- Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
- Built to be flexible so that protocol or functional layers can easily be added. The default download protocol is HTTP, and the S3 API also supports HTTPS. Amazon CLI and SDK use secure HTTPS connections by default.
- Provides functionality to simplify manageability of data through its lifetime. Includes options for segregating data by buckets, monitoring and controlling spend, and automatically archiving data to even lower cost storage options. These options can be easily administered from the Amazon S3 Management Console.
Protecting Your Data
Data stored in Amazon S3 is secure by default; only bucket and object owners have access to the Amazon S3 resources they create. Amazon S3 supports multiple access control mechanisms. With Amazon S3’s data protection features, you can protect your data from both logical and physical failures, guarding against data loss from unintended user actions, application errors, and infrastructure failures. For customers who must comply with regulatory standards, Amazon S3’s data protection features can be used as part of an overall strategy to achieve compliance. The various data security and reliability features offered by Amazon S3 are described in detail below.
Amazon S3 offers flexible security features to block unauthorized users from accessing your data. Use gateway VPC endpoints and interface VPC endpoints to connect to S3 resources from your Amazon Virtual Private Cloud (Amazon VPC) and from on premises. Amazon S3 automatically encrypts all object uploads to all buckets (as of January 5, 2023). Amazon S3 supports both server-side encryption (with four key management options: DSSE-KMS, SSE-KMS, SSE-C, SSE-S3 (base level of encryption)) and client-side encryption for object uploads. Use S3 Inventory to check the encryption status of your S3 objects.
Amazon S3 also supports logging of requests made against your Amazon S3 resources. You can configure your Amazon S3 bucket to create access log records for the requests made against it. These server access logs capture all requests made against a bucket or the objects in it and can be used for auditing purposes.
Amazon S3 provides further protection with versioning capability. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket. This allows you to easily recover from both unintended user actions and application failures. By default, requests will retrieve the most recently written version. Older versions of an object can be retrieved by specifying a version in the request. Storage rates apply for every version stored. You can configure lifecycle rules to automatically control the lifetime and cost of storing multiple versions.
Amazon S3 supports several mechanisms that give you flexibility to control who can access your data as well as how, when, and where they can access it. Amazon S3 provides four different access control mechanisms: Identity and Access Management (IAM) policies, Access Control Lists (ACLs), bucket policies, and query string authentication. IAM enables organizations with multiple employees to create and manage multiple users under a single Amazon Web Services account. With IAM policies, you can grant IAM users fine-grained control to your Amazon S3 bucket or objects. You can use ACLs to selectively add (grant) certain permissions on individual objects. Amazon S3 Bucket Policies can be used to add or deny permissions across some or all of the objects within a single bucket. With Query string authentication, you have the ability to share Amazon S3 objects through URLs that are valid for a predefined expiration time.
S3 Block Public Access is a set of security controls that ensures S3 buckets and objects do not have public access. With a few clicks in the Amazon S3 console, you can apply the S3 Block Public Access settings to all buckets within your Amazon account or to specific S3 buckets. All new buckets have Block Public Access enabled by default. If you want to restrict access to all existing buckets in your account, you can enable Block Public Access at the account level. Once the settings are applied to an Amazon account, any existing or new buckets and objects associated with that account inherit the settings that prevent public access. S3 Block Public Access settings override other S3 access permissions, making it easy for the account administrator to enforce a “no public access” policy regardless of how an object is added, how a bucket is created, or if there are existing access permissions. S3 Block Public Access controls are auditable, provide a further layer of control, and use Amazon Trusted Advisor bucket permission checks, Amazon CloudTrail logs, and Amazon CloudWatch alarms. You should enable Block Public Access for all accounts and buckets that you do not want publicly accessible.
S3 Object Ownership is a feature that disables Access Control Lists (ACLs), changing ownership for all objects to the bucket owner and simplifying access management for data stored in S3. When you configure this bucket setting, ACLs will no longer affect permissions for your bucket and the objects in it. All access control will be defined using resource-based policies, user policies, or some combination of these. ACLs are automatically disabled for new buckets. Before you disable ACLs, review your bucket and object ACLs. To identify Amazon S3 requests that required ACLs for authorization, you can use the aclRequired field in Amazon S3 server access logs or Amazon CloudTrail.
You can securely upload/download your data to Amazon S3 via the SSL endpoints using the HTTPS protocol.
Amazon S3 also supports logging of requests made against your Amazon S3 resources. You can configure your Amazon S3 bucket to create access log records for the requests made against it. These server access logs capture all requests made against a bucket or the objects in it and can be used for auditing purposes.
For more information on the security features available in Amazon S3, refer to the Access Control topic in the Amazon S3 User Guide.
Amazon PrivateLink for S3 provides private connectivity between Amazon S3 and on-premises. You can provision interface VPC endpoints for S3 in your VPC to connect your on-premises applications directly with S3 over Amazon Direct Connect. Requests to interface VPC endpoints for S3 are automatically routed to S3 over the Amazon Web Services China network. You can set security groups and configure VPC endpoint policies for your interface VPC endpoints for additional access controls.
Amazon S3 provides a highly durable storage infrastructure designed for mission-critical and primary data storage. Amazon S3 redundantly stores data in multiple facilities and on multiple devices within each facility. To increase durability, Amazon S3 synchronously stores your data across multiple facilities before confirming that the data has been successfully stored. In addition, Amazon S3 calculates checksums on all network traffic to detect corruption of data packets when storing or retrieving data. Unlike traditional systems, which can require laborious data verification and manual repair, Amazon S3 performs regular, systematic data integrity checks and is built to be automatically self-healing.
Standard is:
- Backed with the Amazon S3 Service Level Agreement for availability.
- Designed for 99.999999999% durability and 99.99% availability of objects over a given year.
- Designed to sustain the concurrent loss of data in two facilities.
Standard - Infrequent Access is:
- Backed with the Amazon S3 Service Level Agreement for availability.
- Designed for 99.999999999% durability and 99.9% availability of objects over a given year.
- Designed to sustain the concurrent loss of data in two facilities.
Amazon S3 Glacier Instant Retrieval is:
- An archive storage class that delivers low-cost storage for long-lived data that is rarely accessed and requires milliseconds retrieval.
- Delivers fast access to archive storage, with the same throughput and milliseconds access as the S3 Standard-IA storage class.
- Designed for 99.999999999% (11 nines) of data durability and 99.9% availability by redundantly storing data across multiple physically separated Amazon Web Services Availability Zones.
Amazon S3 Glacier Flexible Retrieval is:
- Designed for 99.999999999% durability of objects over a given year.
- Designed to sustain the concurrent loss of data in two facilities.
- Configurable retrieval times, from minutes to hours, with free bulk retrievals
Amazon S3 Glacier Deep Archive is:
- Designed for durability of 99.999999999% of objects across multiple Availability Zones
- Lowest cost storage class designed for long-term retention of data that will be retained for 7-10 years
- Ideal alternative to magnetic tape libraries
- Retrieval time within 12 hours
Amazon S3 Intelligent-Tiering is:
- Designed for durability of 99.999999999% of objects across multiple Availability Zones
- Designed for 99.9% availability over a given year
- Stores objects in three automatic access tiers, optimized for frequent, infrequent, and rare access
- Frequent and Infrequent Access tiers have same low latency and high throughput performance of S3 Standard
- Activate optional automatic archive capabilities for rarely accessed objects that can be asynchronously accessed to save even more
- Optional Archive Access and Deep Archive access tiers have same performance as Glacier and Glacier Deep Archive
- Small monthly monitoring and auto-tiering charge, no life cycle charge, retrieval charges, or minimum storage duration
Amazon S3 Access Grants map identities in directories such as Active Directory, or Amazon Identity and Access Management (IAM) principals, to datasets in S3. This helps you manage data permissions at scale by automatically granting S3 access to end-users based on their corporate identity. Additionally, S3 Access Grants log end-user identity and the application used to access S3 data in Amazon CloudTrail. This helps to provide a detailed audit history down to the end-user identity for all access to the data in your S3 buckets.
Storage Management
Amazon S3 makes it easy to manage your data by giving you actionable insight to your data usage patterns and the tools to manage your storage with management policies. All of these management capabilities can be easily administered using the Amazon S3 APIs or Management Console. The various data management features offered by Amazon S3 are described in detail below.
With Amazon S3 Object Tagging, you can manage and control access for Amazon S3 objects. S3 Object Tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object. With these, you’ll have the ability to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics. These object-level tags can then manage transitions between storage classes and expire objects in the background.
Amazon S3 CloudWatch integration helps you improve your end-user experience by providing integrated monitoring and alarming on a host of different metrics. You can receive 1-minute CloudWatch Metrics, set CloudWatch alarms, and access CloudWatch dashboards to view real-time operations and performance of your Amazon S3 storage. For web and mobile applications that depend on cloud storage, these let you quickly identify and act on operational issues. These 1-minute metrics are available at the S3 bucket level. Additionally, you have the flexibility to define a filter for the metrics collected using a shared prefix or object tag allowing you to align metrics filters to specific business applications, workflows, or internal organizations.
You can use Amazon CloudTrail to capture bucket-level (Management Events) and object-level API activity (Data Events) on S3 objects. Data Events include read operations such as GET, HEAD, and Get Object ACL, as well as write operations such as PUT and POST. The detail captured provides support for many types of security, auditing, governance, and compliance use cases.
Amazon S3 can automatically assign and change cost and performance characteristics as your data evolves. It can even automate common data lifecycle management tasks, including capacity provisioning, automatic migration to lower cost tiers, regulatory compliance policies, and eventual scheduled deletions.
As your data ages, Amazon S3 takes care of automatically and transparently migrating your data to new hardware as hardware fails or reaches its end of life. This eliminates the need for you to perform expensive, time-consuming, and risky hardware migrations. You can set Lifecycle policies direct Amazon S3 to automatically migrate your data to lower cost storage as your data ages. You can define rules to automatically migrate Amazon S3 objects to S3 Standard-Infrequent Access (Standard-IA), Amazon S3 Glacier Instant Retrieval, Amazon S3 Glacier Flexible Retrieval, or Amazon S3 Glacier Deep Archive based on the age or number of newer versions of the data. You can set lifecycle policies by bucket, prefix, objects tags, or object size, allowing you to specify the granularity most suited to your use case.
When your data reaches its end of life, Amazon S3 provides programmatic options for recurring and high volume deletions. For recurring deletions, rules can be defined to remove sets of objects after a predefined time period. These rules can be applied to objects stored in Standard or Standard - IA, and objects that have been archived to Amazon S3 Glacier Instant Retrieval, Amazon S3 Glacier Flexible Retrieval, or Amazon S3 Glacier Deep Archive.
You can also define lifecycle rules on versions of your Amazon S3 objects to reduce storage costs. For example, you can create rules to automatically – and cleanly - delete older versions of your objects when these versions are no longer needed, saving money and improving performance. Alternatively, you can also create rules to automatically migrate older versions to either Standard – IA, Amazon S3 Glacier, or Amazon S3 Glacier Deep Archive in order to further reduce your storage costs.
Amazon S3 Replication is an elastic, fully managed, low cost feature that replicates objects between buckets. S3 Replication offers flexibility and functionality in cloud storage, giving you the controls you need to meet your data sovereignty and other business needs.
With S3 Replication, you can replicate objects (and their respective metadata and object tags) into one or more destination buckets in the same Amazon Web Services China Region or in different Amazon Web Services China Regions for reduced latency, compliance, security, disaster recovery, and other use cases. S3 Cross-Region Replication (CRR) can be configured to replicate from a source S3 bucket to one or more destination buckets in different Amazon Web Services China Region. S3 Same-Region Replication (SRR) replicates objects between buckets in the same Amazon Web Services China Region. S3 Batch Replication replicates existing bucket contents between buckets. You can use S3 Batch Replication to backfill a newly created bucket with existing objects, retry objects that were previously unable to replicate, migrate data across accounts, or add new buckets to your data lake. Amazon S3 Replication Time Control (S3 RTC) helps you meet your compliance requirements for data replication by providing an SLA and visibility into replication times.
Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering) is the only cloud storage class that delivers automatic cost savings by moving objects between four access tiers when access patterns change. S3 Intelligent-Tiering storage class is designed to optimize costs by automatically moving data to the most cost-effective access tier, without operational overhead. It works by storing objects in four access tiers: two low latency access tiers optimized for frequent and infrequent access, and two optional archive access tiers designed for asynchronous access that are optimized for rare access. Objects uploaded or transitioned to S3 Intelligent-Tiering are automatically stored in the frequent access tier. For a small monthly monitoring and automation fee per object, Amazon S3 monitors access patterns of the objects in S3 Intelligent-Tiering, and then moving the objects that have not been accessed in 30 consecutive days to the Infrequent Access tier. You can activate one or both archive access tiers to automatically move objects that haven’t been accessed for 90 days to the archive access tier and then after 180 days to the deep archive access tier. If the objects are accessed later, S3 Intelligent-Tiering moves the objects back to the frequent access tier. This means all objects stored in S3 Intelligent-Tiering are always available when needed. There are no retrieval fees when using the S3 Intelligent-Tiering storage class, and no additional tiering fees when objects are moved between access tiers. There is no minimum storage duration for S3 Intelligent-Tiering. S3 Intelligent-Tiering has a minimum object size of 128 KB for auto-tiering. You can store these objects in S3 Intelligent-Tiering, but they will not be monitored and will always be charged at the Frequent Access tier rates, with no monitoring and automation fee. It is the ideal storage class for data with access patterns that are unknown or unpredictable.
Amazon S3 offers several features for managing and controlling your costs. You can use the Amazon Web Services Management Console or the Amazon S3 APIs to apply tags to your Amazon S3 buckets, enabling you to allocate your costs across multiple business dimensions, including cost centers, application names, or owners. You can then view breakdowns of these costs using Amazon Web Services’ Cost Allocation Reports, which show your usage and costs aggregated by your tags. For more information on tagging your S3 buckets, please see the Bucket Tagging topic in the Amazon S3 Developer Guide.
You can use Amazon Direct Connect to transfer large amounts of data to Amazon S3. Amazon Direct Connect makes it easy to establish a dedicated network connection from your premises to Amazon Web Services. Using Amazon Direct Connect, you can establish private connectivity between Amazon Web Services and your datacenter, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.
Amazon S3 event notifications can be sent in response to actions taken on objects uploaded or stored in Amazon S3. Notification messages can be sent through either Amazon SNS or Amazon SQS, or delivered directly to Amazon Lambda to invoke Amazon Lambda functions.
Amazon S3 event notifications enable you to run workflows, send alerts, or perform other actions in response to changes in your objects stored in Amazon S3. You can use Amazon S3 event notifications to set up triggers to perform actions including transcoding media files when they are uploaded, processing data files when they become available, and synchronizing Amazon S3 objects with other data stores. You can also set up event notifications based on object name prefixes and suffixes. For example, you can choose to receive notifications on object names that start with “images/."
Amazon S3 event notifications are set up at the bucket level, and you can configure them through the Amazon S3 console, through the REST API, or by using an Amazon SDK.
Storage Monitoring
In addition to these management capabilities, use Amazon S3 features and other Amazon Web Services services to monitor and control your S3 resources. Apply tags to S3 buckets to allocate costs across multiple business dimensions (such as cost centers, application names, or owners), then use Amazon Cost Allocation Reports to view the usage and costs aggregated by the bucket tags. You can also use Amazon CloudWatch to track the operational health of your Amazon Web Services resources and configure billing alerts for estimated charges that reach a user-defined threshold. Use Amazon CloudTrail to track and report on bucket- and object-level activities, and configure S3 Event Notifications to trigger workflows and alerts or invoke Amazon Lambda when a specific change is made to your S3 resources. S3 Event Notifications automatically transcodes media files as they’re uploaded to S3, processes data files as they become available, and synchronizes objects with other data stores. Additionally, verify the integrity of data transferred to and from S3 and access checksum information at any time using the GetObjectAttributes S3 API or an S3 Inventory report. Choose from four supported checksum algorithms (SHA-1, SHA-256, CRC32, or CRC32C) to check data integrity on your upload and download requests, depending on your application needs.
Learn more about S3 monitoring »
S3 Object Lambda
S3 Object Lambda allows you to add your own code to process data retrieved from S3 before returning it to an application. With S3 Object Lambda, you can use custom code to modify the data returned by standard S3 GET, HEAD, and LIST requests. This can be used to filter rows, dynamically resize an image, redact or mask confidential information, create a custom view of objects, or to otherwise modify data returned by S3. Powered by Lambda functions, all request and data processing runs on infrastructure that is fully managed by Amazon Web Services. Your custom code executes on-demand, eliminates the need to create and store derivative copies of your data, and requires no changes to applications.
Storage Analytics & Insights
S3 Storage Lens delivers organization-wide visibility into object storage usage, activity trends, and makes actionable recommendations to improve cost-efficiency and apply data protection best practices. S3 Storage Lens is the first cloud storage analytics solution to provide a single view of object storage usage and activity across hundreds, or even thousands, of accounts in an organization, with drill-downs to generate insights at the account, bucket, or even prefix level. Drawing from more than 16 years of experience helping customers optimize their storage, S3 Storage Lens analyzes organization-wide metrics to deliver contextual recommendations to find ways to reduce storage costs and apply best practices on data protection.
With Storage Class Analysis, you can monitor the access frequency of the objects within your S3 bucket in order to transition less frequently accessed storage to a lower cost storage class. Storage Class Analysis observes usage patterns to detect infrequently accessed storage to help you transition the right objects to S3 Standard-IA, S3 One Zone-IA, Amazon S3 Glacier Flexible Retrieval, and Amazon S3 Glacier Deep Archive. You can configure a Storage Class Analysis policy to monitor an entire bucket, a prefix, or object tag. Once Storage Class Analysis detects that data is a candidate for transition to S3 Standard-IA, S3 One Zone-IA, Amazon S3 Glacier Flexible Retrieval, or Amazon S3 Glacier Deep Archive you can easily create a new lifecycle policy based on these results. This feature also includes a detailed daily analysis of your storage usage at the specified bucket, prefix, or tag level that you can export to a S3 bucket.
You can simplify and speed up business workflows and big data jobs using the S3 Inventory, which provides a scheduled alternative to Amazon S3’s synchronous List API. S3 Inventory provides a CSV (Comma Separated Values) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix.
Query in Place
Amazon S3 has complementary services that query data without needing to copy and load it into a separate analytics platform or data warehouse. This means you can run data analytics directly on your data stored in Amazon S3.
Amazon S3 is compatible with Amazon Web Services analytics services Amazon Athena and Amazon Redshift Spectrum. Amazon Athena queries your data in Amazon S3 without needing to extract and load it into a separate service or platform. It uses standard SQL expressions to analyze your data, delivers results within seconds, and is commonly used for ad hoc data discovery. Amazon Redshift Spectrum also runs SQL queries directly against data at rest in Amazon S3, and is more appropriate for complex queries and large datasets (up to exabytes). Because Amazon Athena and Amazon Redshift share a common data catalog and data formats, you can use them both against the same datasets in Amazon S3.
Performance
Amazon S3 provides industry leading performance for cloud object storage. Amazon S3 supports parallel requests, which means you can scale your S3 performance by the factor of your compute cluster, without making any customizations to your application. Performance scales per prefix, so you can use as many prefixes as you need in parallel to achieve the required throughput. There are no limits to the number of prefixes. Amazon S3 performance supports at least 3,500 requests per second to add data and 5,500 requests per second to retrieve data. Each S3 prefix can support these request rates, making it simple to increase performance significantly.
To achieve this S3 request rate performance you do not need to randomize object prefixes to achieve faster performance. That means you can use logical or sequential naming patterns in S3 object naming without any performance implications. Refer to the Performance Guidelines for Amazon S3 and Performance Design Patterns for Amazon S3 for the most current information about performance optimization for Amazon S3.
Amazon S3 delivers strong read-after-write consistency automatically for all applications for any storage request, without changes to performance or availability, without sacrificing regional isolation for applications, and at no additional cost. With strong consistency, S3 accelerates and simplifies the migration of on-premises analytics workloads, like Apache Spark and Apache Hadoop, by removing the need to make changes to applications, and reduces costs by removing the need for extra infrastructure to provide strong consistency.
Any request for S3 storage is strongly consistent. After a successful write of a new object or an overwrite of an existing object, any subsequent read request immediately receives the latest version of the object. S3 also provides strong consistency for list operations, so after a write, you can immediately perform a listing of the objects in a bucket with any changes reflected.
Intended Usage and Restrictions
Your use of this service is subject to the Amazon Web Services Customer Agreement.