This is the first post of a 2-part series about scaling your cloud applications. In this post, we will go over some basic concepts of how different scaling strategies are appropriate for different components. This will set the stage for part 2, where we will dive into the goal of linear scaling, and specific tools and patterns that you can use to meet your scaling needs!
In modern software applications, designing for scale is becoming more and more of a necessity. As your application’s usage increases, you may need to handle traffic volumes that were almost inconceivable just a few years ago.
So, what steps can you take to guarantee the new application you are designing will be able to grow along with its usage?
What Is Scaling?
Scaling refers to increasing or decreasing the amount of resources available to your application so that it can handle varying amounts of load. This may involve any type of compute resources: CPU, memory, disk, network I/O, etc.; it depends on the characteristics of your application.
The art of scaling is a delicate balance between cost and utilization. If you aggressively scale up your resources ahead of time, your costs will be higher. If you are not utilizing enough of the resources you are paying for (this is called “overprovisioning”), then your application may not be profitable. But if you do not scale up enough to handle your incoming traffic, then users may experience poor performance or outages, which can hurt your reputation and cost you customers.
In almost every application, being able to scale successfully involves this series of steps:
- Identify the part of your system that will become the next bottleneck.
- Understand what resources or architecture changes will allow you to scale beyond that bottleneck.
- Make a plan for how you will know when you need to scale and how you will execute the scaling when the time comes
- Repeat for your next bottleneck! (Pro Tip: There is always a Next Bottleneck™!)
Horizontal vs Vertical Scaling
In managing different parts of your system, you'll encounter unique constraints that influence how you address bottlenecks. A key decision in this process is choosing between horizontal vs vertical scaling. Horizontal scaling involves adding more nodes to the system, while vertical scaling entails upgrading the capabilities of existing nodes. This decision is critical in determining the most effective way to enhance your system's performance.
Vertical scaling is sometimes the simplest type of scaling to understand, but it has limitations that can make it risky to rely on. Vertical scaling is the idea of taking some “smaller” resource that you are using to run your application and replacing it with a larger version.
Most frequently when we talk about vertical scaling we are talking about the size of a server you are using to run your application. This might be a physical server in your data center or a virtual machine from AWS EC2 or another cloud provider. In either case, these machines have a fixed allotment of (v)CPU cores, memory, local disk storage, and network interfaces (and thus a maximum bandwidth for network I/O). If your application outgrows the machine, it’s a simple idea to just upgrade to a larger machine that has more of one or all of these resources.
This is particularly common with relational databases. Because of the complexity involved in guaranteeing that transactions in a relational database are atomic, databases are very often run on a single machine. If you need to scale your database, you might need to move it to a bigger machine.
However, this can be more challenging than it sounds. Here are some problems with this strategy:
- Migration downtime: unless you have a sophisticated migration strategy, you may need to specify a maintenance window for your application and take it offline for some period of time while you migrate to the new machine.
- Single point of failure: if you are pursuing vertical scaling, it usually means that there is one machine in your architecture that has a lot of responsibility. If there is a failure on this machine, it may bring your entire application down.
- Ceiling: At any given time, there is a maximum number of CPUs, RAM, etc. that you can get in a single machine. If your application’s requirements grow beyond this, you will not be able to scale further. This is an important situation to avoid.
Horizontal scaling, on the other hand, is the idea of increasing the quantity of the resources you are using to run your application. So if your application is running on EC2 VMs, you would add additional VMs (of the same size). If your application is running in a docker container, you would add additional containers. This is the core philosophy of platforms like Kubernetes.
Horizontal scaling is not without its own challenges. Most prominently, you have to make sure that your application is written in a way so adding more VMs or containers will actually increase its ability to handle load. This usually means designing “stateless” applications that are run behind a load balancer; when you add more nodes (VMs or containers) to the load balancer, the load balancer can distribute the load evenly and you can handle more traffic.
This only works if any one of your application’s nodes can handle any request at any time (in other words, handling a request must not rely on local state from a previous request). If you designed your application to operate this way, then adding 100 or 1,000 nodes to the load balancer is usually much easier and more realistic than trying to scale vertically to a server that is 100 or 1,000 times more powerful than your current one!
Linear scaling is the ideal to which we all aspire when designing our applications for scale. In a nutshell, the goal is that every time you add more resources to your application, you increase your capacity/throughput by the same amount, and thus you could scale indefinitely. In the next article, we’ll dig in and discuss whether this is achievable, and how.