EMR - Production AutoScaling Rules

AS
5 min readAug 13, 2020

As we all know autoscaling is add processing power when needed, remove the idle instances when not needed. Why pay when you are not using it.

In this blog, I will discuss some important custom autoscaling policies for your production-ready EMR cluster.

AWS is happy to announce amazon managed EMR scaling in version 5.30.1 and above, which means now if you don’t want to worry about the custom autoscaling rules in your production EMR cluster, you can opt-out for managed EMR scaling. This feature guarantees that it helps to resize your cluster for the best performance at the lowest possible cost. But sometimes we want power in our own hand instead of Amazon EMR managed algorithm that constantly monitors key metrics based on the workloads and optimizes the cluster size for best resource utilization, But we are little greedy here and we want to manage upscale and downscale according to our own requirements. So that it not only gives a clear picture of making the rules more simple and more understandable but also gives the power to configure according to your requirements.

In your production EMR cluster, you can start with managed policies and later can have a custom policy when you better understands the cluster and load to get the most performance from your cluster.

This is how the custom autoscaling rule looks like.

Autoscaling rule: Amazon Source

There are whole lots of metrics provided by Amazon EMR which can be either used for autoscaling rules or can be used for setting up cloud-watch alarm/alerts and Dashboards for monitoring cluster health.

I will not discuss the whole bunch of policies that you get to configure in your cluster. I will discuss only a few which must be there and is sufficient for your production-ready cluster

As your cluster must contain these instance groups which would be Core, Task, Task-Spot(Spot instances for the task nodes), and Master. I will not consider core-spot in this discussion just to make it simpler and effective.

TASK and TASK SPOT instances group
There is only one single rule from which you can regulate autoscaling for your task instance group i.e YARNMemoryAvailablePercentage.
You can set up simple rules like

“Add 1 task instance if YARNMemoryAvailablePercentage is less than 25 for 1 five-minute period with a cooldown of 300 seconds”
And similarly for scale down “Terminate 1 instance if YARNMemoryAvailablePercentage is greater than or equal to75 for 1 five-minute period with a cooldown of 300 seconds”. For spot instance, the range must be little lower than on-demand instances

CORE
Core nodes have two important responsibilities in the cluster.

1> HDFS
if you are using HDFS as data storage then you must specify the upscaling rules according to the usage of HDFS. But since most of us don’t use HDFS instead we use s3 as our data storage So HDFS is only used for yarn logs and spark application logs. So we must set up some configuration to regularize the logs like “spark.history.fs.cleaner.maxAge”, “yarn.log-aggregation.retain-seconds” means it's just simple why to burn HDFS cost just for the logs. Also, you can set “dfs.replication” to 1 or to any number which suits your requirements it's important to do so, as In EMR replication is dynamic and changes according to cluster size. So it will be good when you are creating a full-blown cluster set these properties at the time of cluster creation only.

But must add a rule for upscale when HDFS utilization goes up. The reason for this is if your HDFS utilization reached some 90% in any nodes of the cluster that nodes is marked as unhealthy and kicked-out from the cluster to make the cluster stable which may cause data loss if any on HDFS and failure of any running application on that node.

Add 1 instance if HDFSUtilization is greater than 75 for 1 five-minute period with a cooldown of 300 seconds. The same opposite rule can be applied for downscaling if you are using HDFS for your data storage otherwise no need to put a rule for downscale wrt. HDFSUtilization

2> Application driver container must be on the core node.
So if your cluster doesn’t have breathing capacity in core nodes you cannot start or schedule a new application, your application will always in an ACCEPTED state.
So in production cluster not to create a huge backlog of application in the “accepted” state just add a simple rule to add instance if the application is in a pending state

Add 1 instance if ContainerPending is greater than or equal to1 for 1 five-minute period with a cooldown of 300 seconds. This rule will not help in downscaling the cluster, as you and thousand of running application at a single point of time and also no pending application so you just can not downscale your cluster if there is no pending application

Now the question arises then how will you downscale your cluster when the load varies?
Answer to this question is “containerPendingRatio”

ContainerPendingRatio = ContainerPending / ContainerAllocated
ContainerPending = Application container pending to be executed, your application is in ACCEPTED state and all the containers are in the pending state.
ContainerAllocated = Running containers that are currently running and has allocated resources.

You can not just set one number for the “containerPendingRatio” and that will solve your problem. To reach an ideal state for container pending ratio first you need to analyze your metrics then only you can able to set the rule for upscaling and downscaling using these metrics.

Add 1 instance if ContainerPendingRatio is greater than 0 for 1 five-minute period with a cooldown of 300 seconds

Terminate 1 instance if ContainerPendingRatio is less than or equal to 0 for 5 five-minute periods with a cooldown of 300 seconds

At last, never forget to put some alarm on your cluster which will be a lifesaver for you like when “core-nodes-running” reaches the max value, similarly for “task-node-running” reaches to max configured value, and also to HDFS utilization which will make your nodes unhealthy and get kicked out form cluster. Enjoy playing with EMR.

--

--

AS

Software engineer at Expedia Group. Passionate about data-science and Big Data technologies.