Is there such a thing as too fast when it comes to API performance testing?
Unless you’re trying to play a game engineered around a busy loop sized for a 286, probably not. We always want to get more done in less time.
That’s why the Constant Contact Web Services team is charged with building APIs that help thousands of small businesses get more done in less time. We want our customers to spend more time running their businesses and less time collecting, keying, and organizing data. With that said, there’s a lot of pressure on us to help them succeed.
Let’s put it this way: A large percentage of the more than half million Constant Contact customers use our API every day. If each call made to our API took even a half second longer to complete, every 7,200 API calls made (which is a fraction of our daily volume), would consume an additional hour of our customers’ time. In this post, I discuss the processes and tools we use in our performance testing for:
What to look for in API testing
Performance means many things, and ultimately, we’re concerned with throughput, load handling, and endurance metrics:
- How many requests to an endpoint can we handle per minute?
- What’s the average runtime for a call? The 90th percentile? The worst case?
- Does the system stand up to its users day after day?
- Does performance gracefully degrade or fall off dramatically?
We use a variety of tools to gather this data, so let’s start broad and work our way down.
Production environment API monitoring
While we have multiple levels of test environments, we’re most interested in seeing performance data for our production environment, on real hardware, with real users, and real usage patterns. We use New Relic, a low-overhead monitoring tool, to monitor and ensure our app is happy and healthy. New Relic provides insight into when and where our application might be running into trouble. It monitors our entire application stack, which is particularly critical for Web Services because we consume so many internal services and need to know which tier has the problem. With New Relic, we can see:
- How many requests we’re handling
- Their duration
- The number of failures that have occurred
- How much time is spent in the database or in particular web requests
- If we’re missing our pre-defined SLAs
We’re on a Java stack, so we can get method-level detail by annotating select Java methods with New Relic’s @Trace annotation. Here’s some sample output: Blue is JVM time, yellow is database time, green is external app dependencies time. In this example, we’re most interested in the spike in App server response that happened mid-chart:
With New Relic, we can identify that the spike was because of a database blip with a particular call:
Using this information, we can talk to the DBAs and find out what happened during that time. There was also an associated spike in connection-related errors, which is not good. It’s one thing to have a slow API, but it’s something else entirely to have an unavailable one. This next example shows that our throughput (in milliseconds) increases every evening as people run overnight batch jobs (yellow,) with minimal impact to our response time (blue.)
API testing: pre-production load testing
So, we know we look pretty good live. But what about how we look before we actually go live? New Relic also runs on most of our QA environments and we use Apache JMeter to generate meaningful loads against the APIs. Before we let AppConnect v2.0 out the door, we use JMeter to run an extensive suite of tests against it. The results are available on our internal Wiki and we are working towards regular, automated stress testing. With the combination of these 2 tools, we’re able to:
- Generate endpoint runtime statistics
- Locate load-related issues such as heap problems and thread pool limitations for a single server (which helps with horizontal scaling)
- Locate endurance-related issues such as locating leaks.
JMeter is an excellent tool for generating quite a bit of load against a system and analyzing the results. It can be configured to spawn hundreds of concurrent request threads and run tests from saved scripts that can be parameterized and otherwise customized to provide varied input. Hitting the same potentially cached request over and over is probably not a realistic test.
In addition to indirectly generating data in tools like New Relic, it records its own results for quite a few statistical metrics (Average, Max, Standard Deviation, etc.). Using the data generated by JMeter, when run against a production-like environment, is a great way to validate SLAs. One thing to note, since JMeter generally runs on a single computer, is that if you are hitting a multi-core environment, you may encounter issues with caching by your load balancer since all requests are coming from the same IP address.
Like New Relic, we usually run JMeter as a full system test, returning the combined runtime of the Web Services call all the way down to the databases. Unlike New Relic, we cannot see the per-tier runtimes, just the total. Once we’ve identified problem areas using these tools, we need to drill down and do more extensive troubleshooting. If something appears to be problematic in the Java tier for example, we then use JVM profiling tools. In my next post, I’ll discuss in depth how I’ve used EJ Technologies JProfiler to complete a thorough analysis of our v2 API.
Do you agree or disagree with the process/methods I’ve described here? What tools do you use to monitor performance? Please share your thoughts below. I’d love to hear them.