Benifits of AWS Glue
- Less Hassle: AWS Glue is integrated across a wide range of AWS services. AWS Glue natively supports data stored in Amazon Aurora and other Amazon Relational Database Service engines, Amazon RedShift and Amazon S3 along with common database engines and databases in our virtual private cloud running on Amazon EC2.
- Cost Effective: AWS Glue is serverless. There is no infrastructure to provision or manage AWS Glue handles, provisioning, configuration, and scaling of the resources required to run our ETL jobs. We only pay for the resources that we use while our jobs are running.
- More Power: AWS Glue automates much of the effort in building, maintaining, and running ETL jobs. It identifies data formats and suggests schemas and transformations. Glue automatically generates the code to execute our data transformations and loading processes.
Introduction To AWS Glue ETL
The Extract, Transform, Load(ETL) process has been designed specifically for the purpose of transferring data from its source database to the data warehouse. However, the challenges and complexities of ETL can make it hard to implement them successfully for all our enterprise data. For this reason, Amazon has introduced AWS Glue.
AWS Glue is a fully managed ETL(Extract, Transform, and Load) service that makes it simple and cost-effective to categorize our data, clean it, enrich it, and move it reliably between various data stores. It consists of a central metadata repository known as the AWS Glue data catalog an ETL engine that automatically generates Python code and a flexible scheduler that handles dependency resolution job monitoring. AWS Glue is serverless which means that there is no infrastructure to set or manage a setup.