Last June I saw an interesting conference talk at J-Spring given by Martijn Verburg (from jClarity) about the Performance Diagnostic Methodology (PDM), a structured approach in order to find the root cause of Java performance problems. In this post I will try to highlight the key concepts but I do recommend watching a recording of the talk from Devoxx UK. In the next part of this post, we will try to apply the theory to some problem applications.
What is the PDM?
As already written in the introduction, the Performance Diagnostic Methodology (PDM) is a structured approach in order to find the root cause of Java performance problems. When performance issues occur, often people start panicking and start tuning the JVM without exactly knowing whether they are solving the cause of the performance issue. Therefore, a structured approach can exclude some possible causes and point you into the right direction in order to solve the issue appropriately. The approach is visualized with the next scheme (this scheme is recreated from the original with the permission of Martijn Verburg and Kirk Pepperdine from jClarity).
In the next sections we will traverse through the scheme and highlight which tools can help you with analyzing the performance issue.
Before we actually dive into the scheme, there are three things that you need to know about your infrastructure and application. If you look at your own application and you don’t have these things in place, it is time to take some action 😉 .
- You must know what your actual resources are. This means that you need to know the specifications of the hardware your application is running on and where it is running. This might be a trivial thing when you host the hardware yourself, but it can be a challenge to know this when your application is running in a cloud environment.
- Ensure that you have logical and physical architecture diagrams of your application. You also must know the data flows of your application. If a user reports problems with a certain functionality, then this will make it easier to pinpoint where the problem occurs in your application.
- Have a measurement at each entry and exit point into your architecture. This will also help you in pinpointing the location of the problem. When you have measurements, you are able to verify whether for example the time for consuming a request is increasing in time. You will be able to measure where a possible bottleneck occurs in your application.
It is obvious that all of the above has to be in place before you run into problems. There will be no time (or it will cost you a lot of time) if you need to rake up information about your hardware or the architecture of your application. Time you will need to solve the problem. Probably your users, customers and a bunch of managers will be putting a lot of pressure on you to fix the problem and they won’t be happy when you first need to document the architecture of your application at that moment.
Kernel dominant, user dominant or no dominator
Now back to the scheme. We can distinguish three sections: kernel dominant, user dominant and no dominator. First step is to know what your CPU is doing. You can use the Linux tool
vmstat to know in which section you need to start searching. Run the
vmstat command with e.g. parameter 5 which will print the output of the
vmstat command every 5 seconds (A good explanation of the other columns can be found here). Below an example of the output on a system running a simple Java Spring Boot application:
$ vmstat 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 5 0 0 3261272 96056 1042868 0 0 1579 519 20 3626 45 9 34 13 0 0 0 0 3261264 96064 1042832 0 0 0 2 19 423 6 2 91 2 0 0 0 0 3261264 96064 1042832 0 0 0 0 19 429 6 2 92 0 0 0 0 0 3261140 96064 1042832 0 0 0 0 19 311 4 1 95 0 0
The interesting part in our case is the last section containing the details of the CPU usage and more specifically the following two columns:
- us: percentage of user CPU time
- sy: percentage of system CPU time
Thus, when the system CPU time exceeds 10%, then our problem seems to be kernel dominant. When the user CPU time reaches nearly 100%, then our problem seems to be user dominant. When both system and user CPU are very low and your users are complaining about performance, then it probably will be e.g. a deadlock or your application is waiting for a response of an external interface.
CPU > 10% is Kernel
When the outcome of
vmstat is that the problem might be kernel dominant, then the cause can be one of the following reasons:
- Context switching: two or more applications constantly switching and ‘fighting’ for CPU time;
- Disk IO: tools to be used are
- Virtualisation: KVM CLI tools (
virsh), Docker CLI tools, Kubernetes CLI tools or whatever virtualization software you are using;
- Network IO: tools to be used are
In either way, it is advisable to look at these things together with a sysadmin who is probably already acquainted with these tooling.
CPU User is approaching 100%
When the outcome of
vmstat is that the problem might be user dominant, then it is time to take a look at the JVM and more specifically to the Garbage Collector (GC). First thing to do, is to turn on the GC logging. There are free tools available which give you a graphical view of the GC logs. One of these tools is GCEasy. On their website, it is explained how to turn on the GC logging. For Java 1.4 up to 1.8, you have to pass the following arguments to the JVM:
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<file-path>
For Java 9 and higher, you pass the following arguments:
<file-path> is the path to where the GC logging will be written. When you have some GC logging, you can upload it to their website which in turn will provide a report with all kinds of graphs which will help you to interpret the details. Below some guidelines to interpret the results:
- a full GC will block your application;
- when you notice that the heap is increasing up to its top, then probably you need to increase the heap or increase the amount of memory available on your machine;
- when you notice a lot of full GC’s without freeing heap, then there is probably a memory leak;
- a normal running application will show a saw tooth in the heap consumption graph.
How to detect a memory leak?
When we have excluded that the problem is situated in the JVM or GC, then we probably have a memory leak in our application. In that case, we need to use a memory profiler. A free profiler to use is VisualVM. VisualVM used to be part of the JDK, but now (Java 11) has been move to the GraalVM. It is, however, separately downloadable from GitHub. In the screenshot below, you can see an example of a memory leak. MyMemoryLeak objects are created but cannot be garbage collected. This can also be seen from the number in the Generations column. These objects have survived 56 garbage collections.
It might be possible that standard Java objects are increasingly created. In that situation you have to drill down until you find an object which corresponds to your application. This way you can determine where the memory leak occurs.
When the CPU usage for kernel and user is low and your users are complaining about performance issues, then probably threads are waiting a long time (e.g. for an external system) or are locked. This kind of behavior can also be analyzed with VisualVM. In the screenshot below, we can clearly see that a deadlock occurs between the two threads MyTaskExecutor-1 and MyTaskExecutor-2.
Important thing to notice is that no profiler is 100% accurate.
Optimize your code
If you think that you need to optimize your code, then you have to think twice. The JIT compiler already will optimize your code, probably even better than you will do. And besides that, optimizing your code probably means less readable and maintainable code. Tools that can help you in optimizing your code are Jitwatch and Java Microbench Harness.
In this post we have described how the Performance Diagnostic Methodology works and how it can be used. Of course, this is a theoretical description. Next step is to wait for a real life performance problem and to apply the theory in practice. Instead of waiting for a real life problem, we will create some problem applications in the next part of this post and verify whether we can apply the PDM in order to find the root cause of the problems.