In the first part of this post, we explained the Performance Diagnostic Methodology (PDM) and how to use it. But, the proof of the pudding is in the eating and therefore it is now time to apply the methodology. Of course, the best proof would be to apply the methodology to a real world performance issue, but instead of waiting for that, we will try to simulate some performance issues and verify whether the methodology can work.
Testing Method
We will test the methodology by means of two cases:
- An application with a memory leak, these objects cannot be garbage collected;
- An application with a deadlock between two threads.
In order to simulate the above performance issues, we have created one Spring Boot application. By means of invoking different URLs of the application, we will simulate one of the above two cases. One URL testperformance
will be used to test whether we encounter a performance issue. The source code can be found at GitHub.
We will run the application inside a Virtual Machine running Ubuntu 18.04.01 LTS and we will use Java 11.
Detect a Memory Leak
Our first test will try to verify whether we can detect a memory leak with the methodology.
The Problem Explained
The URL we will invoke is http://ipadress:8080/startleaking/0
. The code being invoked is the following:
@RequestMapping("/startleaking/{timeToWait}") public void startLeaking(@PathVariable int timeToWait) { taskExecutor.execute(() -> { Map m = new HashMap(); while (true) { for (int i = 0; i < NUMBER_OF_OBJECTS_TO_CREATE; i++) { if (!m.containsKey(i)) { m.put(new FaultyKey(i), new MyPerformance()); } } System.out.println("Created leaking objects"); // Wait for the given amount of time try { Thread.sleep(timeToWait); } catch (InterruptedException e) { e.printStackTrace(); } } }); }
After invoking the URL, a thread is started which will continuously add objects to a HashMap
. The key is object FaultyKey
and an object MyPerformance
is added as value. The FaultyKey
class does not implement the equals
and hashcode
methods properly (we don’t override the equals
method) causing the HashMap
to grow over time instead of just replacing the values. With the variable timeToWait
(in ms) at the end of the URL, it is possible to slow down the creation of objects.
The FaultyKey
class is the following:
public class FaultyKey { Integer id; FaultyKey(Integer id) { this.id = id; } @Override public int hashCode() { return id.hashCode(); } }
Start the Problem
Run the application with the following command (we limit the Java heap size to 500 MB, this way the application keeps on running for awhile and eventually will get into performance problems):
java -Xmx500m -jar myperformanceplanet-0.0.1-SNAPSHOT.jar
After successful start of the application, invoke the following URL to start the memory leak: http://ipaddress:8080/startleaking/0
When you continuously invoke the URL http://ipaddress:8080/testperformance
to test the performance, you will notice that the URL often takes more time to respond than normal. There are better ways to test this, but for simplicity we test it this way. At this moment, we know we have some performance issue.
Kernel Dominant or User Dominant?
First thing to check according to the PDM is the CPU usage with vmstat 5
, the output is:
$ vmstat 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 268 214460 138724 1202904 0 0 138 190 20 1746 28 4 65 3 0 1 0 268 200448 138724 1202908 0 0 0 1 19 1729 94 6 0 0 0 2 0 268 222892 138732 1202892 0 0 0 10 20 2249 93 7 0 0 0 1 0 268 247072 138732 1202896 0 0 0 0 19 1019 96 4 0 0 0 1 0 268 195488 138740 1202896 0 0 0 13 19 1186 96 4 0 0 0 1 0 268 192764 138748 1202896 0 0 0 50 19 1633 97 3 0 0 0
We notice that our system CPU usage (sy column) is around 5%, but our user CPU usage (us column) is nearly 100%. This means that our problem is user dominant.
Check Garbage Collector
Next step is to check the Garbage Collector. Therefore, we need to enable the GC logging. Start the application as follows and leave the application running for awhile (but stop before an OutOfMemoryError
occurs):
java -Xmx500m -Xlog:gc*:file=log.gc -jar myperformanceplanet-0.0.1-SNAPSHOT.jar
Upload all log.gc
files to http://gceasy.io/ and take a look at some of the results. GCEasy has noticed that our application is suffering from numerous full GCs as can be seen from the result. Full GCs will block the JVM and this is the reason why we noticed the delay now and then when invoking our test URL.
The following graph of the Heap Usage shows this very clearly (the red triangles are full GCs):
GCEasy advises us to increase the heap size which would be a good first thing to do. But if we do so in this case, this would not solve the problem. We could test this in a test environment and then we would come to the conclusion that there must be another problem. GCEasy also has some kind of OutOfMemoryError
detection, but it did not detect our memory leak. Despite this, the information and graphs provided by GCEasy are very comprehensive and help you in identifying the real problem. But, I would not just take the advice for granted and try to interpret the result yourself with knowledge of your application.
Check On Memory Leaks
Let’s assume that we tested to increase the heap and noticed that this did not solve our problem. The last option according to the PDM is to profile our application. We will use VisualVM to profile. The aim was to run VisualVM on the host machine and to profile our application remotely. However, in version 1.4.2, it seems that Java 11 is not fully supported to profile the application remotely. A good blog for remote profiling with Java 1.8 can be found here.
So, we have to switch to an alternative path. We download and install GraalVM onto our Ubuntu VM. Navigate to the bin
directory of GraalVM and execute the jvisualvm
bash script. This will start VisualVM. Navigate to the Profiler tab and ensure that the package com.mydeveloperplanet.**
is being watched in the memory settings. Next, click the Memory button in order to start profiling and invoke the startleaking
URL again. The result of the profiling is the following:
We notice that the classes FaultyKey
and MyPerformance
are increasing concerning the size and the number of Allocated Objects. Also, the number of Generations (how many GCs the objects have survived) is increasing. We can expand the suspicious classes:
This shows us the location where the memory leak occurs.
Conclusion
Does the PDM help us in order to detect a memory leak? The answer is yes. Sure this was a simple example of a memory leak and we know beforehand where the memory leak is situated, but following the PDM scheme, we were able to detect the memory leak and that was exactly what we wanted to verify.
Detect a deadlock
Our second test will try to verify whether we can detect a deadlock with the methodology.
The Problem Explained
The URL we will invoke is http://ipadress:8080/createdeadlock
. The code being invoked is the following:
@RequestMapping("/createdeadlock") public void createDeadlock() { Object lock1 = new Object(); Object lock2 = new Object(); taskExecutor.execute(() -> { synchronized (lock1) { String threadName = Thread.currentThread().getName(); System.out.println(threadName + " acquired lock on Lock1"); try { Thread.sleep(10000); } catch (InterruptedException e) { e.printStackTrace(); } synchronized(lock2) { System.out.println(threadName + " acquired lock on Lock2"); } } }); taskExecutor.execute(() -> { synchronized (lock2) { String threadName = Thread.currentThread().getName(); System.out.println(threadName + " acquired lock on Lock2"); try { Thread.sleep(10000); } catch (InterruptedException e) { e.printStackTrace(); } synchronized(lock1) { System.out.println(threadName + " acquired lock on Lock1"); } } }); }
After invoking the URL, two threads are started. One thread is acquiring a lock on object lock1
and the other on object lock2
. After 10 seconds, the first thread tries to acquire a lock on object lock2
and the second thread tries to acquire a lock on object lock1
. This will cause a deadlock, both threads are waiting on each other but never releasing the lock on the other object.
Start the Problem
Run the application with the following command:
java -jar myperformanceplanet-0.0.1-SNAPSHOT.jar
After successful start of the application, invoke the following URL to create the deadlock: http://ipaddress:8080/createdeadlock
In the console logging, the following lines are printed, showing us that the deadlock has occurred:
MyTaskExecutor-1 acquired lock on Lock1 MyTaskExecutor-2 acquired lock on Lock2
Invoking the URL again, does not print any statements in the logging anymore since the deadlock of the previous invocation is still going on. This means that we have simulated another kind of performance issue now: new invocations of the URL will actually do nothing.
Kernel Dominant or User Dominant?
First thing to check according to PDM is the CPU usage with vmstat 5
, the output is:
$ vmstat 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 33292 849260 21276 433996 0 1 82 149 20 1214 20 3 75 2 0 0 0 33292 849384 21284 434040 0 0 3 13 19 1622 23 4 72 1 0 2 0 33292 854960 21292 434896 0 0 172 12 19 751 11 2 83 3 0 0 0 33292 855500 21292 434912 0 0 3 0 19 572 8 2 90 1 0
We notice that our user and system CPU usage is very low. This means that our problem is situated in the no dominator section.
Check on Waiting Threads
No dominator means that some thread is probably waiting for an external system or a deadlock has occurred. We return to VisualVM to help us analyzing this issue. We take a look at the Threads tab.
VisualVM has already detected the deadlock. The Thread Dump shows us the following:
"MyTaskExecutor-1" #33 prio=5 os_prio=0 cpu=60,04ms elapsed=972,20s tid=0x00007ff7a811b800 nid=0x5657 waiting for monitor entry [0x00007ff7b22c5000] java.lang.Thread.State: BLOCKED (on object monitor) at com.mydeveloperplanet.myperformanceplanet.PerformanceController.lambda$createDeadlock$2(PerformanceController.java:110) - waiting to lock <0x00000000ec5d7370> (a java.lang.Object) - locked <0x00000000ec5d7380> (a java.lang.Object) at com.mydeveloperplanet.myperformanceplanet.PerformanceController$$Lambda$410/0x0000000100509840.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.1/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.1/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.1/Thread.java:834) Locked ownable synchronizers: - <0x00000000ec5d73c0> (a java.util.concurrent.ThreadPoolExecutor$Worker) "MyTaskExecutor-2" #34 prio=5 os_prio=0 cpu=55,56ms elapsed=972,20s tid=0x00007ff7a8117800 nid=0x5658 waiting for monitor entry [0x00007ff7b33d1000] java.lang.Thread.State: BLOCKED (on object monitor) at com.mydeveloperplanet.myperformanceplanet.PerformanceController.lambda$createDeadlock$3(PerformanceController.java:127) - waiting to lock <0x00000000ec5d7380> (a java.lang.Object) - locked <0x00000000ec5d7370> (a java.lang.Object) at com.mydeveloperplanet.myperformanceplanet.PerformanceController$$Lambda$411/0x0000000100509c40.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.1/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.1/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.1/Thread.java:834) Locked ownable synchronizers: - <0x00000000ec5d7598> (a java.util.concurrent.ThreadPoolExecutor$Worker)
This gives us the information that both threads are blocked and also gives us the location where the lock occurs.
Conclusion
Does the PDM help us in order to detect a deadlock? The answer is yes. Again, this was a simple example, but following the PDM scheme, we were able to detect the deadlock and that was exactly what we wanted to verify.
Summary
Our goal in this second part of the post was to verify whether the PDM can work by using some simple performance examples. Following the PDM, we were able to detect the cause of the problems. Of course, applying the methodology in a real world situation will be more difficult, but with these examples we were able to get acquainted with the tooling and the methodology.