Amazon Elastic MapReduce
Hosted Hadoop Framework
Amazon Elastic MapReduce
Hosted Hadoop Framework
Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data.Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters and uses Hadoop, an open source framework, to distribute your data and processing across a resizable cluster of Amazon EC2 instances. Amazon EMR is used in a variety of applications, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Customers launch millions of Amazon EMR clusters every year.
Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to quickly and cost-effectively process vast amounts of data.Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters and uses Hadoop, an open source framework, to distribute your data and processing across a resizable cluster of Amazon EC2 instances. Amazon EMR is used in a variety of applications, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Customers launch millions of Amazon EMR clusters every year.
Benefits
Easy to Use
Low Cost
EMR pricing is simple and predictable: You pay a per-instance rate for every second used, with a one-minute minimum charge. You can launch a 10-node EMR cluster for as little as ¥0.187 per hour. You can save the cost of the instances by selecting Amazon EC2 Spot for transient workloads and Reserved Instances for long-running workloads.
Elastic
Reliable
Secure
EMR automatically configures EC2 firewall settings, controlling network access to instances and launches clusters in an Amazon Virtual Private Cloud (VPC). Server-side encryption or client-side encryption can be used with the Amazon Key Management Service or your own customer-managed keys. EMR makes it easy to enable other encryption options, like in-transit and at-rest encryption, and strong authentication with Kerberos. You can use Amazon Lake Formation or Apache Ranger to apply fine-grained data access controls for databases, tables, and columns.
Flexible
Benefits
Easy to Use
Low Cost
EMR pricing is simple and predictable: You pay a per-instance rate for every second used, with a one-minute minimum charge. You can launch a 10-node EMR cluster for as little as ¥0.187 per hour. You can save the cost of the instances by selecting Amazon EC2 Spot for transient workloads and Reserved Instances for long-running workloads.
Elastic
Reliable
Secure
EMR automatically configures EC2 firewall settings, controlling network access to instances and launches clusters in an Amazon Virtual Private Cloud (VPC). Server-side encryption or client-side encryption can be used with the Amazon Key Management Service or your own customer-managed keys. EMR makes it easy to enable other encryption options, like in-transit and at-rest encryption, and strong authentication with Kerberos. You can use Amazon Lake Formation or Apache Ranger to apply fine-grained data access controls for databases, tables, and columns.
Flexible
Use Cases
Extract, Transform, Load (ETL)
Clickstream Analysis
Machine Learning
Real-Time Streaming
Genomics
Use Cases
Extract, Transform, Load (ETL)
Clickstream Analysis
Machine Learning
Use EMR's built-in machine learning tools, including Apache Spark MLlib, TensorFlow, and Apache MXNet for scalable machine learning algorithms, and use custom AMIs and bootstrap actions to easily add your preferred libraries and tools to create your own predictive analytics toolset.
Real-Time Streaming
Analyze events from Apache Kafka, Amazon Kinesis, or other streaming data sources in real-time with Apache Spark Streaming and Apache Flink to create long-running, highly available, and fault-tolerant streaming data pipelines on EMR.
Genomics
EMR can be used to process vast amounts of genomic data and other large scientific data sets quickly and efficiently. Researchers can access genomic data hosted for free on Amazon Web Services.