Revisit Amazon Web Services re:Invent 2024’s biggest moments and watch keynotes and innovation talks on demand
Amazon Redshift extends data warehouse queries to your data lake, with no loading required. You can run analytic queries against petabytes of data stored locally in Redshift, and directly against exabytes of data stored in Amazon S3. It is simple to set up, automates most of your administrative tasks, and delivers fast performance at any scale.
Faster performance
Massively parallel
Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to exabytes. Redshift uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. It uses a massively parallel processing (MPP) data warehouse architecture to parallelize and distribute SQL operations to take advantage of all available resources. The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a high bandwidth mesh network to maximize throughput between nodes.
Machine learning
Amazon Redshift uses machine learning to deliver high throughout, irrespective of your workloads or concurrent usage. Redshift utilizes sophisticated algorithms to predict incoming query run times, and assigns them to the optimal queue for the fastest processing. For example, queries such as dashboards and reports with high concurrency requirements are routed to an express queue for immediate processing. As concurrency increases further, Amazon Redshift predicts when queuing may begin and automatically deploys transient resources with the Concurrency Scaling feature to ensure consistently fast performance, irrespective of variability in demand on the cluster.
Result caching
Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that execute repeat queries experience a significant performance boost. When a query executes, Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Easy to setup, deploy, & manage
Automated provisioning
Amazon Redshift is simple to set up and operate. You can deploy a new data warehouse with just a few clicks in the Amazon Web Services Management cConsole, and Redshift automatically provisions the infrastructure for you. Most administrative tasks are automated, such as backups and replication, so you can focus on your data, not the administration. When you want control, Redshift provides options to help you make adjustments tuned to your specific workloads. New capabilities are released transparently, eliminating the need to schedule and apply upgrades and patches.
Automated backups
Amazon Redshift automatically and continuously backs up your data to Amazon S3. Redshift can asynchronously replicate your snapshots to S3 in another region for disaster recovery. You can use any system or user snapshot to restore your cluster using the Amazon Web Services Management Console or the Redshift APIs. Your cluster is available as soon as the system metadata has been restored, and you can start running queries while user data is spooled down in the background.
Fault tolerant
Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. Redshift continuously monitors the health of the cluster, and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance.
Flexible querying
Amazon Redshift gives you the flexibility to execute queries within the console or connect SQL client tools, libraries, or Business Intelligence tools you love. Query Editor on the Amazon Web Services console provides a powerful interface for executing SQL queries on Redshift clusters and viewing the query results and query execution plan (for queries executed on compute nodes) adjacent to your queries.
Integrated with third-party tools
Enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming and visualizing data.
Cost-effective
No upfront costs, pay as you go
Amazon Redshift is the most cost-effective data warehouse, and you pay only for the resources you provision. Redshift is the only cloud data warehouse that offers On-Demand pricing with no up-front costs, Reserved Instance pricing which can save you up to 75% by committing to a 1- or 3-year term, and per query pricing based on the amount of data scanned in your Amazon S3 data lake. For more information, see the Amazon Redshift pricing page.
Choose your node type
You can select from two node types to optimize Redshift for your data warehousing needs. Dense Compute (DC) nodes allow you to create very high performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs). If you want to scale further or reduce costs, you can switch to our more cost-effective Dense Storage (DS) node types that use larger hard disk drives for a very low price point. Scaling your cluster or switching between node types requires a single API call or a few clicks in the Amazon Web Services Management Console.
Scale quickly to meet your needs
Petabyte-scale data warehousing
Amazon Redshift is simple and quickly scales as your needs change. With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change.
Exabyte-scale data lake analytics
Redshift Spectrum, a feature of Redshift, enables you to run queries against exabytes of data in Amazon S3 without having to load or transform any data. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats.
Limitless concurrency
Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries - whether they query data in your Amazon Redshift data warehouse, or directly in your Amazon S3 data lake.
Query your data lake
Amazon S3 data lake
Amazon Redshift is the only data warehouse that extends your queries to your Amazon S3 data lake without loading data. You can query open file formats you already use, such as Avro, CSV, Grok, JSON, ORC, Parquet, and more, directly in S3. This gives you the flexibility to store highly structured, frequently accessed data on Redshift local disks, keep exabytes of structured and unstructured data in S3, and query seamlessly across both to provide unique insights that you would not be able to obtain by querying independent datasets.
Amazon Redshift Concurrency Scaling
Get consistent, fast query performance for highly concurrent workloads
Analytics workloads can be highly unpredictable resulting in slower query performance and users competing for resources. Customers need an automated, cost-effective solution that handles ever-changing query volumes without comprising performance.
With the Concurrency Scaling feature, you can easily support thousands of concurrent users and concurrent queries, with consistently fast query performance. As concurrency increases, Amazon Redshift automatically adds query processing power in seconds to process queries without any delays. Once the workload demand subsides, this extra processing power is automatically removed, so you pay only for the time when Concurrency Scaling clusters are in use. Amazon Redshift allows customers to scale with minimal cost-impact, as each cluster earns up to one hour of free Concurrency Scaling credits per day. These free credits are sufficient for the concurrency needs of 97% of Redshift customers. See the pricing page for more details.
With Concurrency Scaling you can:
- Get consistently fast performance for thousands of concurrent queries and users
- Allocate the clusters to specific user groups and workloads, and control the number of clusters that can be used
- Continue to use your existing applications and Business Intelligence tools.
To enable Concurrency Scaling, simply set the Concurrency Scaling Mode to Auto in the Redshift Console.
Amazon Redshift Data Sharing
Amazon Redshift data sharing allows you to extend the ease of use, performance, and cost benefits of Amazon Redshift offers in a single cluster to multi-cluster deployments while being able to share data. Data sharing enables instant, granular, and fast data access across Amazon Redshift clusters without the need to copy or move it. Data sharing provides live access to data so that your users always see the most up-to-date and consistent information as it’s updated in the data warehouse. You can securely share live data with Amazon Redshift clusters in the same or different Amazon accounts.
Amazon Redshift data sharing provides:
- A simple and direct way to share data across Amazon Redshift data warehouses
- Instant, granular, and high performance access without data copies and data movement.
- Live and transactionally consistent views of data across all consumers.
- Secure and governed collaboration within and across organizations and external parties.
There is no additional cost to use data sharing on your Amazon Redshift clusters.
Data sharing builds on Amazon Redshift RA3 managed storage, which decouples storage and compute, allowing either of them to scale independently. With data sharing, workloads accessing shared data are isolated from each other. Queries accessing shared data run on the consumer cluster and read data from the Amazon Redshift managed storage layer directly without impacting the performance of the producer cluster. You can now rapidly onboard any number of workloads with diverse data access patterns and SLA requirements and not be concerned about resource contention. Workloads accessing shared data can be provisioned with flexible compute resources that meet their workload-specific price performance requirements and be scaled independently as needed in a self-service fashion.