With Amazon Glue, you pay an hourly rate, billed by the second, for crawlers (discovering data) and ETL jobs (processing and loading data). For the Amazon Glue Data Catalog, you pay a simple monthly fee for storing and accessing the metadata. If you provision a development endpoint to interactively develop your ETL code, you pay an hourly rate, billed per second.
-
ETL jobs and interactive sessions
-
Data Catalog storage and requests
-
Crawlers
-
DataBrew interactive sessions
-
DataBrew jobs
-
ETL jobs and interactive sessions
-
With Amazon Glue, you only pay for the time your ETL job takes to run. There are no resources to manage, no upfront costs, and you are not charged for startup or shutdown time. We charge you an hourly rate based on the number of Data Processing Units (or DPUs) used to run your ETL job. A single Data Processing Unit (DPU) provides 4 vCPU and 16 GB of
memory. We bill for jobs and development endpoints in increments of 1 second, rounded up to the nearest second.There are three types of Amazon Glue jobs: Apache Spark, Spark Streaming, and Python shell.
Apache Spark and Spark Streaming job runs require a minimum of 2 DPU. By default, Amazon Glue allocates 10 DPU to each Apache Spark job and 2 DPU to each streaming job. Jobs using Amazon Glue version 0.9 or 1.0 have a 10-minute minimum billing duration, while jobs that use Glue versions 2.0 and later have a 1-minute minimum.
For Python shell jobs, you can allocate either 1 DPU or 0.0625 DPU. By default, Amazon Glue allocates 0.0625 DPU to each Python shell job. These jobs have a 1-minute minimum billing duration.
Interactive Sessions is optional, and billing applies only if you use it for interactive ETL code development. We charge for Interactive Sessions based on the time the session is active and the number of DPUs. Interactive sessions have configurable idle timeouts. Amazon Glue Interactive Sessions require a minimum of 2 DPUs and have a default of 5 DPU. There is a 1-minute minimum billing duration for each provisioned Interactive Session. Amazon Glue Studio Job Notebooks provide a built-in interface for Interactive Sessions. We do not charge for the Job Notebooks but do charge for the Interactive Sessions they use.
Development endpoints is optional, and billing applies only if you use it for interactive ETL code development. We charge for development endpoints based on the time the endpoint is provisioned and the number of DPU. Development endpoints do not time out. Development Endpoints require a minimum of 2 DPUs and have a default of 5 DPU. There is a 10-minute minimum billing duration for each provisioned development endpoint.
Amazon Glue Studio data previews allow you to test your transformations during the job authoring process. Each Amazon Glue Studio data preview session uses 2 DPUs, runs for 30 minutes, and stops automatically.
Pricing
- ¥3.021 per DPU-Hour for each Apache Spark or Spark Streaming job, billed per second with a 1-minute minimum (Glue version 2.0 and later) or 10-minute minimum (Glue version 0.9/1.0)
- ¥3.021 per DPU-Hour for each Python Shell job, billed per second, with a 1-minute minimum
- ¥3.021 per DPU-Hour for each provisioned Development Endpoint, billed per second with a 10-minute minimum
- ¥3.021 per DPU-Hour for each Interactive Session billed per second with a 1-minute minimum.
- ¥3.021 per DPU-Hour for each Amazon Glue Studio data preview session, billed in 30-minute units and invoiced as development endpoints
Additional charges
If you ETL data from data sources such as Amazon S3, Amazon RDS, or Amazon Redshift, you are charged standard request and data transfer rates. If you use Amazon CloudWatch, you are charged standard rates for CloudWatch logs and CloudWatch events.
Pricing example
ETL job example: Consider an Amazon Glue Apache Spark job that runs for 15 minutes and uses 6 DPU. The price of 1 DPU-Hour is ¥3.021. Since your job ran for 1/4th of an hour and used 6 DPUs, We will bill you 6 DPU * 1/4 hour * ¥3.021, or ¥4.532.
Amazon Glue Studio Job Notebooks and Interactive Sessions example: Suppose you use a notebook in Amazon Glue Studio to interactively develop your ETL code. An Interactive Session has 5 DPU by default. If you keep the session running for 24 minutes or 2/5th of an hour, you will be billed for 5 DPUs * 2/5 hour at ¥3.021 per DPU-Hour or ¥6.042.
ML Transforms example: Similar to Amazon Glue jobs runs, the cost of running ML Transforms, including FindMatches on your data will vary based on the size of your data, the content of your data, and the number and types of nodes that you use. In the following example, we used FindMatches to integrate points of interest information from multiple data sources. With a data set size of ~11,000,000 rows (1.6GB), a size of Label data (examples of true matches or true no-matches) of ~8,000 rows (641kb), running on 16 instances of type G.2x, then you would have a labelset generation runtime of 34 minutes at a cost of ¥54.781, a metrics estimation runtime of 11 minutes at a cost of ¥17.723, and a FindMatches job execution runtime of 32 minutes at a cost of ¥51.558.
-
Data Catalog storage and requests
-
With the Amazon Glue Data Catalog, you will be charged ¥6.866 per 100,000 objects, per month. An object in the Amazon Glue Data Catalog is a table, table version, partition, or database.
You will be charged ¥6.866 per million requests. Some of the common requests are CreateTable, CreatePartition, GetTable and GetPartitions. For complete list of requests supported by the Amazon Glue Data Catalog, please see our documentation.
Pricing
Storage:
- ¥6.866 per 100,000 objects in a month
Requests:- ¥6.866 per million requests in a month
-
Crawlers
-
There is an hourly rate for Amazon Glue crawler runtime to discover data and populate the Amazon Glue Data Catalog. You are charged an hourly rate based on the number of Data Processing Units (or DPUs) used to run your crawler. A single Data Processing Unit (DPU) provides 4 vCPU and 16 GB of memory. You are billed in increments of 1 second, rounded up to the nearest second, with a 10-minute minimum duration for each crawl. Use of Amazon Glue crawlers is optional, and you can populate the Amazon Glue Data Catalog directly through the API.
Pricing
- ¥3.021 per DPU-Hour, billed per second, with a 10-minute minimum per crawler run
-
DataBrew interactive sessions
-
A session is initiated when you open an Amazon Glue DataBrew project. You are billed for the total number of the sessions used. Each session is 30 minutes. The first 40 interactive sessions are free for first-time users of DataBrew across China (Beijing) region, operated by Sinnet, and China (Ningxia) region, operated by NWCD. You are billed at the same rate when using DataBrew API operations.
Pricing
- ¥ 6.53 per DataBrew session
Pricing examples
Amazon Glue DataBrew example: The price for each 30 minutes interactive session is ¥ 6.53. If you utilize 2 sessions for an Amazon Glue DataBrew project, you will be billed 2 interactive sessions * ¥ 6.53 per session or ¥13.06.
If an Amazon Glue DataBrew job runs for 10 minutes and consumes 12 DataBrew nodes, the price for 1 node-hour is ¥ 3.1344. Since your job ran for 1/6th of an hour and consume 12 nodes, you will be billed 12 nodes * 1/6 hour * ¥ 3.1344 per node hour or ¥ 6.2688.
-
DataBrew jobs
-
With Amazon Glue DataBrew, you only pay for the time you use to clean and normalize data when you are running the jobs. You are charged an hourly rate based on the number of the DataBrew nodes used to run your job. By default, DataBrew allocates 10 nodes to each job. DataBrew jobs have a 1-minute billing duration.
A single Amazon Glue DataBrew node provides 4 vCPU and 16 GB of memory. There are no resources to manage and no upfront costs, and you are not charged for startup or shutdown time.
Pricing
- ¥ 3.1344 per DataBrew node hour, billed per minute
Additional charges
You may incur additional charges if your Amazon Glue DataBrew jobs utilizes other Amazon services or transfers data. For example, if your DataBrew jobs reads and writes data to or from Amazon S3, you will be billed for the read and write requests and the data stored in Amazon S3. For details on the Amazon service pricing, see the pricing section of the relevant Amazon service detail pages.
Pricing examples
Amazon Glue DataBrew example: The price for each 30 minutes interactive session is ¥ 6.53. If you utilize 2 sessions for an Amazon Glue DataBrew project, you will be billed 2 interactive sessions * ¥ 6.53 per session or ¥13.06.
If an Amazon Glue DataBrew job runs for 10 minutes and consumes 12 DataBrew nodes, the price for 1 node-hour is ¥ 3.1344. Since your job ran for 1/6th of an hour and consume 12 nodes, you will be billed 12 nodes * 1/6 hour * ¥ 3.1344 per node hour or ¥ 6.2688
Get started building with Amazon Glue on the Amazon Web Services Management Console.