Skip to content
The Complete Guide to Scalability Testing featured

The Complete Guide to Scalability Testing

By Author: Team Agora In Developer

One of the key challenges developers face is that their applications encounter problems that don’t pop up during development. This happens when software isn’t tested properly against real-world usage. That’s because a higher load can strain a system so much that the cracks will start to show.

That’s why performing a scalability test on your application before launch and at regular intervals afterwards is vital. It ensures that your application can cope with the added demand from increased customers and workloads.

This guide will cover the essentials of testing for scalability (computer science experts also call it a stress or performance test) and how to do it properly to prevent impacting the experience for your users:

  1. What is Scalability?
  2. Vertical vs Horizontal Scalability
  3. Elasticity vs Scalability
  4. What is Scalability Testing?
  5. Upward vs. Downward Scalability Testing
  6. The Pros & Cons of Scalability Testing
  7. Scalability Testing Attributes
  8. How to Do Scalability Testing
  9. Scale Your RTC App with Agora

What is Scalability?

Scalability refers to the ability of software to adjust to increasing loads. It determines how many concurrent users and requests a system can handle. Scalable software is a must if you want to scale your real-time voice, video, or chat applications. Without scalability, operations are limited and as a result, so is revenue. It also means businesses lose out on valuable opportunities.

For example, suppose an e-commerce site can’t scale to accommodate 100,000 concurrent users during a big sale. In that case, the lack of scalability would lead to lost sales or a poor customer experience due to system slowdowns. More importantly, scalable software architecture can help prevent security flaws and exploits due to a system slowdown.

Vertical vs Horizontal Scalability

There are two kinds of scalability: horizontal scalability and vertical scalability.

Horizontal scalability, or scaling out, is the approach of adding more components or resources to the system to accommodate greater demand. For instance, a company can add more servers alongside existing servers to help share the system load. Horizontal scaling is the easiest way to scale an application, but is also the most expensive. Also, adding more hardware is more difficult and costlier to maintain.

The other type of scalability is called vertical scaling, where you improve the capacity or performance of existing components of your app. Vertical scaling is cheaper than horizontal scaling because it doesn’t require any significant investment in hardware. However, vertical scaling can get complex. Plus, compatibility issues and components piling up can affect system performance, leading to a more sluggish output.

As you can see, scalability is largely a hardware issue. Adding more servers or upgrading components can make an application scalable. However, it could be solved by software as well. You can use good programming practices to optimize code for it to run more efficiently, consuming fewer resources and accommodating more users simultaneously.

Another good approach is to use cloud-based solutions, because these platforms make it easy to scale vertically by simply allocating resources (such as CPU and storage) as needed. They can also make your application elastic as well as scalable. But what’s the difference between the two?

Elasticity vs. Scalability

A discussion of scalability wouldn’t be complete without mentioning elasticity—a similar metric but one with a slightly different meaning.

Scalability is concerned with the system’s load that gradually increases over time. For example, if a business is steadily growing its customers, the website would need to scale up to accommodate the added demand. In many ways, scalability sets the workload threshold that the system can accommodate. It’s also proactive in nature—you need to scale up and account for future demand before you actually need it to ensure uninterrupted operations.

Elasticity, on the other hand, is the ability of the system to increase and decrease resources to account for the dynamic workload. In other words, it refers to the flexibility of a system to adjust to sudden spikes or dips in demand. For instance, an e-commerce app might see 500,00 concurrent users during peak times like Black Friday or Christmas, but just 50,000 users during the rest of the year. In this case, an elastic system can allocate resources during high demand, then free them up during low demand.

Elasticity is vital if you use a cloud-based system that charges on a per-use basis. It ensures you’re not overpaying for idle resources during periods of low load. It can also help allocate resources to applications that need them more.

Knowing the distinction between elasticity and scalability (and when it applies) is vital because addressing them requires two different solutions.

A scalable architecture requires “traditional” scaling methods, like adding more servers or improving the specifications of your existing machines. In either case, you increase the load capacity of your system permanently.

An elastic architecture requires real-time allocation of resources, which can be achieved with a specific solution. Not all cloud services support it, and you often need to configure those that do first to make it elastic.

One last comparison is that an elastic system must be scalable, but a scalable architecture isn’t necessarily elastic.

What is Scalability Testing?

Scalability testing is a non-functional software test that gauges how well an application or system performs at different user loads. Its goal is to find out if your system will break at a forecasted load and then reveal insights so you can fix it.

Scalability testing is often done in anticipation of increased expansion due to added users, transactions, processes, and other system loads. It can help spot areas of improvement to ensure that a website or app runs uninterrupted.

Scalability testing is similar to load testing in evaluating an application’s performance based on its load capacity. However, there’s an important distinction between the two.

Load testing is all about finding the breaking point of the system by subjecting it to maximum load in one go. Its main concern is to identify performance issues.

Scalability testing, on the other hand, does it gradually. It wants to understand why the system behaves that way at certain load levels and to give insights to improve it. The main concern is to find out how the system can accommodate a target number of users or transactions.

Upward vs. Downward Scalability Testing

There are two kinds of scalability testing: upward and downward.

Upward testing involves adding a virtual workload to the application until it reaches a breaking point. This helps determine the maximum capacity that the system can handle.

Downward testing is the reverse – it starts with a high workload, then gradually reduces that until you reach the optimal load level. A downward test is often performed after an upward test or if the application fails the initial scalability test.

The Pros & Cons of Scalability Testing


Why is this process so crucial to the success of businesses? Let’s take a look at some of the business benefits of regular scalability testing.

It can help you detect bugs early and fix them before you launch your software or expand it to account for more users. Not only will this lead to a more polished product, but it can also help lower your cost. According to the 1-10-100 Rule, the price to fix an error is ten times higher during development and up to 100 times higher after you launch the software.

Scalability testing can also help you determine the exact computing resources you need to fulfill your projected demand. This helps prevent overspending on new hardware or infrastructure investments.

Ultimately, scalability testing is all about delivering the best user experience. Crashes, unresponsiveness, and slowdowns due to a system strained with too much load can negatively affect customers. As a result, they can abandon your product altogether.

Scalability testing can help you test the system and check on its responsiveness. It allows you to be proactive and identify performance bottlenecks immediately so that you can resolve them ahead of peak seasons.


Scalability testing involves more time and money, especially with larger applications. A detailed test can take a long time to finish, which can delay the application’s launch or cause it to go over budget. Because of these cons, it’s good to have a solid reason before testing scalability. An enterprise software that gets hundreds of thousands of concurrent users is a good candidate, while a simple app with limited users might not be.


Note that scalability testing is not a full-proof solution. A testing environment can’t mirror a production environment by 100%. There will always be real-world circumstances you either won’t know in advance or can’t replicate entirely.

In either case, these “unknown” loads might lead to test results that are better than they actually are in the real world. This is also the case with a scalability test with a limited scope or measuring the wrong metrics. These can give you false results that will cause more harm in the long run.

Scalability Testing Attributes: What Are You Testing?

There are many different variables you can evaluate with a scalability test. Which one you pick will depend on the nature of your application and infrastructure. Common metrics include the following:

Response Time

Response time is the delay between the time a user performs an action (such as clicking on a button or submitting the form) and when they receive a response from the application.

The most basic measure of response time is how long a web page loads from when a user clicks on a link or enters the URL on their browser. This is perhaps one of the most important metrics in scalability testing because it measures responsiveness, which significantly impacts user experience. A high response time makes the application seem sluggish and “buggy.”

The most common cause of high response time is server delays, so that’s what scalability tests typically look into. Specifically, the goal is to determine the maximum number of users the network can withstand before response time becomes too low.

Generally, the higher the number of users, the higher the response time will be. That is understandable, as the server struggles to process the huge volume of simultaneous user requests. If the user is located geographically further away from the server, it can also introduce a delay in response time.

Response time is slightly different in cloud or hybrid environments because the workload is typically distributed between multiple servers. In these cases, a scalability test gauges the effectiveness of the load balancer in ensuring that no server gets overloaded with too many requests.

It’s also worthwhile to measure the response time of each server component in such a distributed architecture. That way, you’ll measure the overall response time regardless of the application load.

The best way to improve a high response time is to optimize your network, such as using a content delivery network (CDN). This works by spreading data around the globe to help reduce delays caused by large geographic distances between the user and server.

Unnecessarily long and complex code can also prolong response times. Even a few seconds of delay, multiplied by thousands of users, can introduce a significant slowdown. Optimizing and minifying code and scripts can speed up server processing and response times.


Throughput measures how many requests or processes the application can handle at a set period. This varies depending on the nature of the application.

For example, a website might look at throughput as the number of web page requests a server can process in an hour. On the other hand, a database could measure throughput as the number of SQL queries it can handle per minute.

Generally, throughput shouldn’t change regardless of the server load placed on the system. An analogy is a fast food restaurant that can serve ten customers a minute. It doesn’t matter if thousands of customers are lined up outside – it should still be able to “process” at a steady rate. Thus, when doing a scalability test, developers often define a throughput goal that the application needs to meet at various loads.

A scalability test is often used to find the application’s upper bound or maximum throughput limit. Here, virtual users are added steadily until the throughput starts to even out and stabilize. But if it starts to drop, it could indicate a deeper problem or a bottleneck in the application.

Here’s an example. Suppose you’re doing a scalability test, and your throughput dips dramatically at a certain point. Further investigation reveals that, at the time of the drop, the system experienced a slowdown in the database layer. In this scenario, the database is the bottleneck that lowered throughput levels.

As you can see, unstable throughput is often just a symptom of an underlying issue that needs attention.

Memory Usage

Memory usage measures how much RAM the application consumes per user per task, measured in byte units, such as gigabytes or terabytes. It’s a resource utilization metric because it gauges how efficiently the application uses a system resource (RAM).

Memory usage is a crucial metric because it can determine how fast and responsive an application is. If the system gets low on memory, it can slow down or crash the program entirely. Even a slight increase in memory usage can be detrimental if multiplied over multiple users.

There are two sides to fixing memory usage problems.

On one side, memory usage is mostly about best programming practices. Developers must code the application in such a way that it consumes the least memory. For example, the application code should optimize SQL queries to the database to minimize RAM utilization or redundant calls.

On the flip side, memory usage is all about hardware. The system memory can only support a finite number of simultaneous user requests or transactions. Scalability testing aims to find this limit. Once this threshold is reached, scaling a system further requires adding more RAM or database storage.


CPU usage measures how much processing power an application requires to work, measured in hertz units, such as megahertz (MHz).

CPU usage is a similar metric to memory usage. For starters, both are resource utilization metrics that evaluate the application’s efficiency in using system resources. It also directly affects user experience because high CPU usage can slow down or crash an application. At worst, it can shorten your system CPU’s life.

Poor programming practices also cause excessive CPU usage, much like memory. For example, using “dead” code or threading can cause the software to use unnecessary processing power.

But, as with memory, the CPU is a limited resource that can only handle a set amount of tasks and user requests. Upgrading or adding more server components can help spread CPU usage and improve performance.

Network Usage

Network usage is a metric that determines how much bandwidth the application uses, measured in bytes per second (Bps).

A scalable application should have minimal network usage even with a large volume of user requests. If this is excessive, it can cause network congestion that will lead to high response times and a bad user experience.

Improving network usage often boils down to programming practices. For example, compression algorithms can reduce the data size the application sends across the network, minimizing bandwidth use.

A scalability test is important to detect any dramatic spikes in network usage, so you can investigate further and resolve them. But network congestion can also be caused by variables outside your control, such as the type of network that you’re in. To rule these out, performing a scalability test in various network conditions is vital. For instance, you should have test scenarios for 4G, 5G, and Wi-Fi networks.

How to Test the Scalability of an Application

Scalability testing generally involves four steps:

  1. The first step is to assess the current load of the application and predict its future capacity based on factors like an increasing number of users. Doing this gives you a good benchmark to start with, plus it allows you to do a test within reason. For example, you don’t want to do a scalability test involving 500,000 concurrent users when you’ve only seen 50,000 on a peak day. That’s a waste of time, money, and effort.
  2. Next, you must design the test based on the metrics you want to check. This involves two things – test scenarios and the testing environment. A separate test environment to perform scalability testing is crucial, so you don’t disturb an organization’s operations. Remember to mirror your production environment as closely as possible to ensure accurate results, including the exact hardware specifications.
    It would be best if you also considered a reliable scalability testing tool based on your needs. You can look into examples, such as Apache Jmeter, LoadNinja, Load Impact, Load View, and NeoLoad. A test scenario is a series of repeatable software tasks that you’ll use to gauge system scalability. Usually, this is the most processor-intensive task representing the application at its “busiest” state. For instance, it could be the pixel calculation algorithm of a graphics processing software that eats up considerable CPU and RAM. Ideally, you want to have separate scenarios representing various situations and different load levels (low, medium, and high) to check how your application will react. The easiest is to set the number of virtual users in the test. For example, suppose you expect 500,000 website visitors at peak seasons. In that case, you can test scalability with 500,000 virtual users to verify if the application will break.
  3. Once the test environment and script are ready, you can execute them. The best approach here is to test at regular intervals to get a complete picture of the application’s scalability. Also, if you’re testing in a distributed environment, check that the load balancer utilizes multiple servers so no single server is overloaded.
  4. Finally, you should document and analyze the results after executing the tests, including the relevant metrics like throughput and network usage. Figure out at which load level the application broke down, implement some changes, and re-run the test scenario to verify the improvements.

Scale Your Real-Time Communication App with Agora

Real-time communication is one application where scalability is crucial – unless you want users to experience sluggish video and lagged calls. Agora is the leader in delivering seamless real-time voice chat and video chat capabilities with a superior user experience. With Agora’s hyper scalability, your application will be able to withstand sudden spikes in traffic and scale from one to millions of simultaneous users gracefully during live video streaming.

Scale your application to support any number of end users, anywhere across the world, with Agora’s global edge network. Growing your application and audience is hard enough. Make sure your application does it seamlessly. To learn more, visit our website and get started for free today.