Create EMR cluster in AWS using terraform: Practical Step-by-Step Guide
Step 1: Install Terraform
If you haven’t already, install Terraform on your machine. You can download by referring to Install Terraform
Step 2: Configure AWS Provider
Create a new Terraform configuration file, let’s call it main.tf. In this file, you need to define the AWS provider and specify your AWS credentials. Here’s an example:
provider "aws" {
region = "us-east-1" # Replace with your desired AWS region
access_key = "YOUR_AWS_ACCESS_KEY"
secret_key = "YOUR_AWS_SECRET_KEY"
}
Replace YOUR_AWS_ACCESS_KEY and YOUR_AWS_SECRET_KEY with your actual AWS access key and secret key. Alternatively, you can use environment variables or an AWS credentials file.
Step 3: Create EMR cluster
Open the main.tf file and paste the following Terraform configuration. This configuration creates an EMR cluster with a single master node and a single core node, both using the t2.micro instance type (eligible for the AWS Free Tier).
resource "aws_iam_role" "emr_service_role" {
name = "emr_service_role"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "elasticmapreduce.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonElasticMapReduceRole"]
}
# Define the EC2 instance profile
resource "aws_iam_role" "emr_ec2_instance_role" {
name = "emr_ec2_instance_role"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "emr_ec2_instance_role_policy_attachment" {
role = aws_iam_role.emr_ec2_instance_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonElasticMapReduceFullAccess"
}
resource "aws_iam_instance_profile" "emr_instance_profile" {
name = "emr_instance_profile"
role = aws_iam_role.emr_ec2_instance_role.name
}
resource "aws_emr_cluster" "example_cluster" {
name = "Example Cluster"
release_label = "emr-5.32.0"
applications = ["Spark", "Hadoop"]
service_role = aws_iam_role.emr_service_role.arn
ec2_attributes {
instance_profile = aws_iam_instance_profile.emr_instance_profile.arn
}
master_instance_group {
instance_type = "m5.xlarge"
}
core_instance_group {
instance_type = "m5.xlarge"
instance_count = 1
}
}
Step 4: Initialize Terraform
Open your terminal or command prompt, navigate to the directory containing your main.tf file, and run the following command to initialize Terraform:
terraform init
Step 5: Review the Execution Plan
Before applying the configuration, you can review the execution plan by running:
terraform plan
Step 6: Apply the Configuration
If the execution plan looks good, apply the configuration by running:
terraform apply
This command will prompt you to confirm the changes. Type yes to proceed. Terraform will create the AWS cluster in AWS according to your configuration.
Step 7: Verify the deployment via the AWS console
Step 8: Delete the deployment.
You can delete the AWS ECR once it’s not required via the following command in the cli:
terraform destory
How To Create EMR Cluster In AWS Using Terraform ?
In today’s data-driven world, big data processing has become an integral part of many organizations’ workflows. Amazon EMR (Elastic MapReduce) is a cloud-based platform provided by Amazon Web Services (AWS) that simplifies the process of running and scaling Apache Hadoop and Apache Spark clusters for big data processing. EMR takes care of provisioning compute resources, installing and configuring the required software, and managing the cluster lifecycle, allowing you to focus on your data processing tasks rather than the underlying infrastructure.
While you can create an EMR cluster using the AWS Management Console or Command Line Interface (CLI), managing infrastructure as code with Terraform offers several advantages. Terraform is an open-source Infrastructure as Code (IaC) tool that enables you to define, provision, and manage your cloud infrastructure resources in a consistent, repeatable, and version-controlled manner.