One of the biggest advantages of the cloud is that it offers superior capabilities for workload scaling compared to the on-premises infrastructure. It includes both vertical and horizontal scalability with a wide range of sizing options. However, many organizations don’t understand the difference between vertical vs. horizontal cloud scaling, which means they don’t realize why cloud computing is a game-changer. On the cloud, teams don’t have to deal with rigid hardware constraints or unoptimized capacity. They can strike the balance between compute and demand in near-real-time – a tremendous advantage in our fast-paced world.

In this post, we’ll define vertical and horizontal scaling and explain how AWS users can achieve both at the same time. We’ll also touch on how to implement auto-scaling on AWS so that cloud engineers never have to worry about managing compute capacity themselves. With the stage set, let’s dive into horizontal vs. vertical scaling.

What is Vertical Scaling?

Vertical scaling refers to the process of increasing compute, memory, or I/O by boosting the capacity of an existing machine. To use an analogy, increasing vertical capacity is like replacing an inexperienced call center worker who takes 5 calls per hour with a more experienced hire who gets through 10 per hour. In this case, we’ve increased the capacity of an existing role (i.e., the call center worker) to handle higher throughput.

While operating on-premises, vertical scaling involves adding new hardware or replacing components with more capable ones in an existing server so that it can handle the same workloads better and faster. This may require ordering new machines from a supplier, putting them in a rack cabinet, and configuring them correctly, which can take days or even weeks.

Vertical scaling on the cloud, however, is as simple as selecting a different instance type (aka virtual server) with more capacity and letting AWS handle all the hardware changes behind the scenes. AWS offers a massive selection of instance types to fit a wide range of use cases, including general-purpose, memory-optimized, storage-optimized, compute-optimized, and accelerated instances. AWS users can choose instance types that fit their unique needs and upgrade or downgrade in seconds, if necessary.

What is Horizontal Scaling?

Horizontal scaling means adding new servers to help lighten the compute, memory, or I/O load on existing servers. Building on the call center example from earlier, this would mean adding more call center employees rather than swapping the junior-level employee with a more senior person. We’ve again increased our capacity for work, but this time in a different way.

On-premises, we have similar issues with horizontal scaling as we do with vertical scaling. IT leads have to order new servers, find rack space (or expand physical data center capacity), install the hardware, and then prepare the servers for existing workloads. This takes too much time for organizations that need to meet unpredictable demand or absorb sudden changes in traffic. Not only that, we have to ensure that the load is evenly balanced across all the servers. In an on-premises infrastructure, this approach would lead to a long reaction time to every change in demand.

Fortunately, AWS simplifies horizontal scaling as well. Spinning up new instances and plugging them into existing architectures takes minutes. IT workers don’t have to deal with any of the hardware, physical space constraints, or maintenance. The instances can be added dynamically in response to increased workload, or manually in advance to meet forecasted demand. So, the IT has a choice to either increase an instance type until it is maxed out or if the team doesn’t want to scale vertically for another reason, scaling horizontally is another way to increase performance capacity.

Things get really exciting when cloud engineers combine vertical and horizontal scaling. This is how teams find the happy medium between budgetary limitations and performance requirements. The way to accomplish this is through auto-scaling.

Auto Scaling with AWS

Auto scaling on AWS refers to the process of letting AWS manage the adding, subtracting, or swapping out of existing server instances based on predefined policies. AWS offers the following dynamic auto-scaling policies:

  • Simple scaling: scale up or down based on one scaling adjustment
  • Step scaling: scale up or down based on a collection of scaling adjustments
  • Target tracking: scale up or down based on some metric, like average CPU utilization

The right cloud scaling policy depends on the nature of the application, as does whether to scale vertically or horizontally. In many cases, organizations choose to scale horizontally to avoid having to stop existing instances in order to replace them with a different type.

Auto-scaling is especially powerful when combined with AWS load balancers that automatically check the health of individual instances. With the right load balancer and auto-scaling combination, engineering teams can distribute traffic evenly across many instances and automatically scale up or down, vertically or horizontally, based on key metrics. On-premises, this is virtually impossible. On AWS, it’s easy.

