When an EMR Cluster is created, some of the configurations to consider include the type of node configuration needed, whether auto-scaling is required, and how spot instances can help. These three options are interrelated and can sometimes be confusing.

1. Instance Configuration

This involves choosing the types of instances you need for the master, task, and core nodes of the cluster.

Uniform Instance Groups

Default and simplest option. Can have a maximum of 50 instance groups: one for the master, one for the core instance group, and up to 48 optional task instance groups. Each instance group contains nodes of the same instance type, which cannot be changed once created. Additional task instance groups can be created with different instance types.

Instance Fleet

Offers the widest variety of provisioning options for node configurations. Each node type has a single instance fleet, and the task instance fleet is optional. For each instance fleet, you can specify up to five instance types (e.g., R5.2xlarge, M5.4xlarge). For core and task instance fleets, you can assign a target capacity, which can be based on the number of vCPUs or EC2 instances, depending on the selected option. In the instance fleet, when you assign a target capacity, Amazon chooses any mix of the specified instance types to fulfill the target capacity.

2. Auto Scaling

Auto-scaling allows for the automatic scaling out and scaling in of core and task nodes based on the scaling policy set. This helps EMR handle spikes in workload due to high volumes, traffic, or other reasons. A policy can have scale-out rules that define when to add nodes and scale-in rules that define when to remove nodes. An example of a simple auto-scaling policy is to add 2 nodes when YARN memory availability reaches 15% and remove 2 nodes when it reaches 75%. There are many parameters on which nodes can be added and removed. Auto-scaling is not available for instance fleet configurations. Uniform instance groups can have their own scaling policies.

3. Spot Instances

EMR can use on-demand instances as well as spot instances. Spot instances are unused EC2 instances available for less than on-demand pricing, sometimes up to 90% less. When you choose spot instances, you set a max spot price you bid for, and if available, you get them at a spot price (not necessarily at the max price). However, the instances can be taken away if the price goes higher than your max bid. Both uniform instance groups and instance fleets can have spot instances. In the instance fleet configuration, when specifying the target capacity, you can choose the number of on-demand and spot instances required. It can be a mix, and Amazon fulfills it based on availability.

In the uniform instance configuration, when adding an instance group, you can choose either on-demand or spot instances for that group. Non-critical workloads in production and non-production clusters can use spot instances. Choosing a spot instance for the master node can cause the whole cluster to be terminated if the spot instance is taken away. Also, choosing all spot instances for core nodes can be problematic. Therefore, choosing a mix of on-demand and spot instances for core nodes and opting for all spot instances for task nodes would be a good idea. Storing data in HDFS can risk data loss if using spot instances.

Conclusion

Uniform instance group configurations are straightforward and support spot instances and auto-scaling, which are essential features for critical workloads. Instance fleets, while not supporting auto-scaling, work well with spot instances because they offer the option of adding a wide range of node instances to an instance fleet. Production loads and critical workloads that need auto-scaling can use uniform instance groups with auto-scaling and spot instances if required, while other use cases can go with instance fleets.