Revisit Amazon Web Services re:Invent 2024’s biggest moments and watch keynotes and innovation talks on demand

 ✕

Home  »  Products  »  Amazon Glue

Amazon Glue

Simple, scalable, and serverless data integration

Home  »  Products  »  Amazon Glue

Amazon Glue

Simple, scalable, and serverless data integration

Amazon Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Amazon Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products.

Amazon Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the Amazon Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can create and run ETL workflows. Data analysts and data scientists can use Amazon Glue DataBrew to visually enrich, clean, and normalize data without writing code.

Amazon Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Amazon Glue provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It involves multiple tasks, such as discovering and extracting data from various sources; enriching, cleaning, normalizing, and combining data; and loading and organizing data in databases, data warehouses, and data lakes. These tasks are often handled by different types of users that each use different products.

Amazon Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the Amazon Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can create and run ETL workflows. Data analysts and data scientists can use Amazon Glue DataBrew to visually enrich, clean, and normalize data without writing code.

Benefits

Faster Data Integration
Different groups across your organization can use Amazon Glue to work together on data integration tasks, including extraction, cleaning, normalization, combining, loading, and running scalable ETL workflows. This way, you reduce the time it takes to analyze your data and put it to use from months to minutes.
No Servers to Manage
Amazon Glue runs in a serverless environment. There is no infrastructure to manage, and Amazon Glue provisions, configures, and scales the resources required to run your data integration jobs. You pay only for the resources your jobs use while running.
Automate Your Data Integration at Scale
Amazon Glue automates much of the effort required for data integration. Amazon Glue crawls your data sources, identifies data formats, and suggests schemas to store your data. It automatically generates the code to run your data transformations and loading processes. You can use Amazon Glue to easily run and manage thousands of ETL jobs or to combine and replicate data across multiple data stores using SQL.

Benefits

Faster Data Integration

Different groups across your organization can use Amazon Glue to work together on data integration tasks, including extraction, cleaning, normalization, combining, loading, and running scalable ETL workflows. This way, you reduce the time it takes to analyze your data and put it to use from months to minutes.

No Servers to Manage

Amazon Glue runs in a serverless environment. There is no infrastructure to manage, and Amazon Glue provisions, configures, and scales the resources required to run your data integration jobs. You pay only for the resources your jobs use while running.

Automate Your Data Integration at Scale

Amazon Glue automates much of the effort required for data integration. Amazon Glue crawls your data sources, identifies data formats, and suggests schemas to store your data. It automatically generates the code to run your data transformations and loading processes. You can use Amazon Glue to easily run and manage thousands of ETL jobs or to combine and replicate data across multiple data stores using SQL.

How It Works

  • Build Event-Driven ETL Pipelines
  • Amazon Glue can run your ETL jobs as new data arrives. For example, you can use an Amazon Lambda function to trigger your ETL jobs to run as soon as new data becomes available in Amazon S3. You can also register this new dataset in the Amazon Glue Data Catalog as part of your ETL jobs.

  • Find Data Across Multiple Data Stores
  • You can use the Amazon Glue Data Catalog to quickly discover and search across multiple Amazon data sets without moving the data. Once the data is cataloged, it is immediately available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

  • Self-Service Visual Data Preparation
  • Amazon Glue DataBrew enables you to explore and experiment with data directly from your data lake, data warehouses, and databases, including Amazon S3, Amazon Redshift, Amazon Lake Formation, Amazon Aurora, and Amazon RDS. You can choose from over 250 prebuilt transformations in Amazon Glue DataBrew to automate data preparation tasks, such as filtering anomalies, standardizing formats, and correcting invalid values. After the data is prepared, you can immediately use it for analytics and machine learning. Learn more about Amazon Glue DataBrew here.

How It Works

  • Build Event-Driven ETL Pipelines
  • Amazon Glue can run your ETL jobs as new data arrives. For example, you can use an Amazon Lambda function to trigger your ETL jobs to run as soon as new data becomes available in Amazon S3. You can also register this new dataset in the Amazon Glue Data Catalog as part of your ETL jobs.

  • Find Data Across Multiple Data Stores
  • You can use the Amazon Glue Data Catalog to quickly discover and search across multiple Amazon data sets without moving the data. Once the data is cataloged, it is immediately available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

  • Self-Service Visual Data Preparation
  • Amazon Glue DataBrew enables you to explore and experiment with data directly from your data lake, data warehouses, and databases, including Amazon S3, Amazon Redshift, Amazon Lake Formation, Amazon Aurora, and Amazon RDS. You can choose from over 250 prebuilt transformations in Amazon Glue DataBrew to automate data preparation tasks, such as filtering anomalies, standardizing formats, and correcting invalid values. After the data is prepared, you can immediately use it for analytics and machine learning. Learn more about Amazon Glue DataBrew here.

How to Get Started

Find out How It Works

Learn more about the key features of Amazon Glue.
 

Explore Amazon Glue features 
Sign up for a Free Account

Pay nothing or try for free while learning the fundamentals and building on Amazon Web Services.

Create a Free Account 
Connect With an Expert

From development to enterprise-level programs, get the right support at the right time.

Explore support options 

How to Get Started

 Find out How It Works

Learn more about the key features of Amazon Glue.

Explore Amazon Glue features 

 Sign up for a Free Account

Pay nothing or try for free while learning the fundamentals and building on Amazon Web Services.

Create a Free Account 

 Connect With an Expert

From development to enterprise-level programs, get the right support at the right time.

Explore support options