Using JProfiler to Analyze Performance
I just completed a long analysis of our V2 API using JProfiler for API performance testing and want to spend some time — ok, the lion’s share of this post — discussing what we learned from this analysis.
JVM profiling by and large, aims to identify long-running or redundant method invocations. It can also be used to identify memory issues, but we didn’t use it for that purpose in this particular exercise.
We do not use JProfiler in our production environment because it does its magic by modifying the JVM and taking measurements from the runtime environment. Therefore, it negatively impacts performance.
That depends on how it’s used exactly, but more on that later.
We run our tests against a development environment, which means the performance of the services we depend on are unpredictable and certainly not reliable simulations of their production counterparts. Due to hardware constraints and our testing goals, 90 percent of my tests were single-threaded sequential requests, often made against the same datasets. It should be readily apparent that profiled data cannot be used to establish or validate SLAs. It is primarily used to look for algorithmic issues.
JProfiler has two major modes of analyzing CPU time: profiled and sampled. There are major advantages and drawbacks to both, and understanding the differences is a key element to gathering usable data.
“Profiled mode” works by adding instrumentation to selected classes, recording method invocations, how long they took, and what other methods were called. The results are summed up for the whole call tree. Like JMeter, but at the method instead of the API level, we can see average, maximum, and minimum runtimes. We can also see how many times a particular method was called. The method statistics report looks like this:
The line in blue at the top shows the method name (redacted), the total time in this method, the number of calls made to it, the average, median, minimum, maximum, standard deviation, and outlier coefficients. The provided graph makes this more human-parsable. It’s clear from this data that out of the 20 calls to this method, one of them took far longer than the rest, which were all fairly comparable at 40 to 60 ms. This data gives us a clue as to what the issue might be — something happens on the first call that gets cached or otherwise sped up in subsequent invocations. Further analysis confirmed that a service we depend on has a cache that was missed on the first invocation.
Dependent service runtime brings up a few more points. By default, JProfiler shows runtime for threads in the “Running” state. If they are blocked, waiting on I/O or Socket activity, the view must be switched to “All States” for that time to be adequately reflected. However, the runtime of the internal service may not be reliable depending on your development environment sizing, etc. Also, the performance penalties on profiled JVMs do not extend to dependencies outside the JVM. This means that your tier may appear to perform less than it actually is compared to other components.
To exemplify the “running-versus-blocked” issue, here are the Hot Spots or top CPU runtimes, in the Running state for a test:
And these are the Hot Spots when All States are considered for the same test:
There are huge differences in runtimes here, but, what can we discard? The first four methods in the All States view are actually blocked threads just listening for requests. They are not actually doing anything and can be safely ignored. In the first view Running state, only the first call can be discarded as it’s an active-polling loop.
However, the highlighted method call is the same in both views and it’s clear that something important and substantial is happening here — this call is spending a large amount of time blocked. It’s also clear that it’s spending a substantial amount of time in the Running state. With this information, we need to ask the question, is it worth focusing on the 115 ms spent in Running when it is spending in this development environment more than 60x as much time blocked? Should we instead focus on the tiers we’re blocked by or waiting for? Or, do we need a cache to avoid making these calls at all? Is that even an option? More information is needed to figure out the best approach and a lot of times there are more questions than answers gained from JVM profiling. But that’s a good thing.
So, we get some pretty useful data out of profiling, but it has two major drawbacks, one of which is often fatal:
- First, it can use a ton of memory.
- Second, profiling does not keep runtimes proportional, especially when methods are small and called often.
If every method call is penalized, let’s say a millisecond of overhead for profiling purposes (it’s not really this bad), a method that is called 1,000 times and runs in 5 microseconds each time will show as taking 1,005 milliseconds when the real runtime is only 5 milliseconds. Compare this to a method that takes 500 milliseconds to run, but is only run once. It’ll show as taking 501 milliseconds. That’s a huge distortion leading to large errors, making profiling very unattractive at times.
There are two things that can be done about this:
- Auto-Tuning: With auto-tuning, JProfiler keeps its eye out for these methods and can exclude them from instrumentation after they’ve been automatically identified over repeated runs. Of course this means that tests need to be run over and over and over until auto-tuning has removed all of the disproportionate runtimes.
- Filtering: The second method is to only include classes you know you care about. For example, your own high-level packages and perhaps those containing important third-party classes. This has an added benefit of requiring a lot less memory for JProfiler. Even in this case, you are non-linearly affecting performance. Furthermore, in multi-threaded tests, the Profiler may actually introduce race conditions by selectively slowing down the JVM!
Of course, proactively excluding classes has a serious drawback. You are now no longer profiling the runtimes of quite a few of the methods in your call stack. However, you really shouldn’t care about that if you’re looking at the right classes. If you’re looking at too many classes, you’re probably misusing the Profiler in the first place!
The lesson learned: Don’t use the Profiling mode to go fishing for problems. Instead, use it to precisely pinpoint serious performance bottlenecks and focus on particular target methods.
If you need to go fishing, use “Sampled mode” instead.
“Sampled mode” operates by pausing the JVM every 5 milliseconds and recording the call stack. The resultant data is organized exactly like it is in Profiled mode, except there is no total invocation count available. Methods that take under 5 milliseconds to run will usually never be sampled. If they are, they will be slightly distorted downwards as most calls will be missed, but this should be more or less inconsequential. Another consideration is that the total runtime will be distorted upwards, as stopping the JVM and gathering all this data every 5 ms is not cheap.
While Sampled mode provides a lot less data than Profiled mode (to the point where the JProfiler manual says in most cases you should use Profiled), it’s preferable when doing broad analysis. For example, Sampled mode showed us accurately that the many short methods we call to parse our JSON payloads were inconsequential compared to where we call extra constructors with I/O overhead. Profiling had this relationship off in the right direction with well over a magnitude because of the overhead of profiling those tiny JSON methods.
- Full-system testing: Always do full-system testing using real hardware, substantially real data, and various loads. Look for non-linear performance degradation. When do requests begin getting denied and how close is your production system to this point?
- Scaling: While you of course need to focus your testing on today’s expected load, you also need to understand how things will scale to handle future requirements, or react to a temporary spike in traffic due to an unforeseen event.
- Longevity testing: Run these tests for lengthy periods of time to make sure your systems don’t slow down the longer they run, pointing to things like memory leaks.
- Monitoring: Keep an eye on your production systems with appropriate monitoring tools. Look for bottlenecks. It’s better to find and fix a small problem in a real system than spend time digging out a bigger problem in a theoretical system.
- Too much data: Don’t drown yourself in too much data, especially when it can be ambiguous, or worse, misleading. Use broad sampling tools to search for issues in your algorithms and pay the most attention to tool-identified Hot Spots.
- JVM Profiling: Have a focused goal when JVM profiling. There are so many variables in a system that not knowing exactly what you’re looking for means you’ll be unable to hold the correct parameters constant. You might focus too heavily on a method that seems expensive until you are looking at the entire picture. Are you interested in how your system reacts to requests for larger chunks of data or perhaps more requests in the same period of time? If you reuse objects to avoid construction overhead, are you making a bigger problem out of lock contention, or worse, introducing threading bugs? If you don’t know what it is you’re looking for, you don’t know enough to use this tool yet. Put it down.
- Share data: Record all of your data points and make them available to the team! It’s the best way to know how a change you made will affect the system.
This list isn’t exhaustive and there is certainly a lot to consider in the wide world of performance testing. Hopefully we’ve been able to provide you a few new insights.
Please share your experiences doing JVM Profiling or profiling using other tools with us below. We’d love to hear from you.