How to Solve Your Java Performance Problems (Part 2)

In the first part of this post, we explained the Performance Diagnostic Methodology (PDM) and how to use it. But, the proof of the pudding is in the eating and therefore it is now time to apply the methodology. Of course, the best proof would be to apply the methodology to a real world performance issue, but instead of waiting for that, we will try to simulate some performance issues and verify whether the methodology can work.

Testing Method

We will test the methodology by means of two cases:

  1. An application with a memory leak, these objects cannot be garbage collected;
  2. An application with a deadlock between two threads.

In order to simulate the above performance issues, we have created one Spring Boot application. By means of invoking different URLs of the application, we will simulate one of the above two cases. One URL testperformance will be used to test whether we encounter a performance issue. The source code can be found at GitHub.

We will run the application inside a Virtual Machine running Ubuntu 18.04.01 LTS and we will use Java 11.

Detect a Memory Leak

Our first test will try to verify whether we can detect a memory leak with the methodology.

The Problem Explained

The URL we will invoke is http://ipadress:8080/startleaking/0. The code being invoked is the following:

@RequestMapping("/startleaking/{timeToWait}")
public void startLeaking(@PathVariable int timeToWait) {

  taskExecutor.execute(() -> {
    Map m = new HashMap();
    while (true) {
      for (int i = 0; i < NUMBER_OF_OBJECTS_TO_CREATE; i++) {
        if (!m.containsKey(i)) {
          m.put(new FaultyKey(i), new MyPerformance());
        }
      }

      System.out.println("Created leaking objects");

      // Wait for the given amount of time
      try {
        Thread.sleep(timeToWait);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }
  });

}

After invoking the URL, a thread is started which will continuously add objects to a HashMap. The key is object FaultyKey and an object MyPerformance is added as value. The FaultyKey class does not implement the equals and hashcode methods properly (we don’t override the equals method) causing the HashMap to grow over time instead of just replacing the values. With the variable timeToWait (in ms) at the end of the URL, it is possible to slow down the creation of objects.

The FaultyKey class is the following:

public class FaultyKey {

  Integer id;

  FaultyKey(Integer id) {
    this.id = id;
  }

  @Override
  public int hashCode() {
    return id.hashCode();
  }

}

Start the Problem

Run the application with the following command (we limit the Java heap size to 500 MB, this way the application keeps on running for awhile and eventually will get into performance problems):

java -Xmx500m -jar myperformanceplanet-0.0.1-SNAPSHOT.jar

After successful start of the application, invoke the following URL to start the memory leak: http://ipaddress:8080/startleaking/0

When you continuously invoke the URL http://ipaddress:8080/testperformance to test the performance, you will notice that the URL often takes more time to respond than normal. There are better ways to test this, but for simplicity we test it this way. At this moment, we know we have some performance issue.

Kernel Dominant or User Dominant?

First thing to check according to the PDM is the CPU usage with vmstat 5, the output is:

$ vmstat 5
procs -----------memory----------   ---swap-- -----io---- -system-- ------cpu-----
r b    swpd   free   buff   cache     si so      bi  bo    in   cs  us sy id wa st
1 1     268 214460 138724 1202904      0  0     138 190    20 1746  28  4 65  3  0
1 0     268 200448 138724 1202908      0  0       0   1    19 1729  94  6  0  0  0
2 0     268 222892 138732 1202892      0  0       0  10    20 2249  93  7  0  0  0
1 0     268 247072 138732 1202896      0  0       0   0    19 1019  96  4  0  0  0
1 0     268 195488 138740 1202896      0  0       0  13    19 1186  96  4  0  0  0
1 0     268 192764 138748 1202896      0  0       0  50    19 1633  97  3  0  0  0

We notice that our system CPU usage (sy column) is around 5%, but our user CPU usage (us column) is nearly 100%. This means that our problem is user dominant.

Check Garbage Collector

Next step is to check the Garbage Collector. Therefore, we need to enable the GC logging. Start the application as follows and leave the application running for awhile (but stop before an OutOfMemoryError occurs):

java -Xmx500m -Xlog:gc*:file=log.gc -jar myperformanceplanet-0.0.1-SNAPSHOT.jar

Upload all log.gc files to http://gceasy.io/ and take a look at some of the results. GCEasy has noticed that our application is suffering from numerous full GCs as can be seen from the result. Full GCs will block the JVM and this is the reason why we noticed the delay now and then when invoking our test URL.

gceasy-memory-leak-result

The following graph of the Heap Usage shows this very clearly (the red triangles are full GCs):

gceasy-memory-leak-heap

GCEasy advises us to increase the heap size which would be a good first thing to do. But if we do so in this case, this would not solve the problem. We could test this in a test environment and then we would come to the conclusion that there must be another problem. GCEasy also has some kind of OutOfMemoryError detection, but it did not detect our memory leak. Despite this, the information and graphs provided by GCEasy are very comprehensive and help you in identifying the real problem. But, I would not just take the advice for granted and try to interpret the result yourself with knowledge of your application.

Check On Memory Leaks

Let’s assume that we tested to increase the heap and noticed that this did not solve our problem. The last option according to the PDM is to profile our application. We will use VisualVM to profile. The aim was to run VisualVM on the host machine and to profile our application remotely. However, in version 1.4.2, it seems that Java 11 is not fully supported to profile the application remotely. A good blog for remote profiling with Java 1.8 can be found here.

So, we have to switch to an alternative path. We download and install GraalVM onto our Ubuntu VM. Navigate to the bin directory of GraalVM and execute the jvisualvm bash script. This will start VisualVM. Navigate to the Profiler tab and ensure that the package com.mydeveloperplanet.** is being watched in the memory settings. Next, click the Memory button in order to start profiling and invoke the startleaking URL again. The result of the profiling is the following:

visualvm-memory-profiler.PNG

We notice that the classes FaultyKey and MyPerformance are increasing concerning the size and the number of Allocated Objects. Also, the number of Generations (how many GCs the objects have survived) is increasing. We can expand the suspicious classes:

visualvm-memory-profiler-expanded

This shows us the location where the memory leak occurs.

Conclusion

Does the PDM help us in order to detect a memory leak? The answer is yes. Sure this was a simple example of a memory leak and we know beforehand where the memory leak is situated, but following the PDM scheme, we were able to detect the memory leak and that was exactly what we wanted to verify.

Detect a deadlock

Our second test will try to verify whether we can detect a deadlock with the methodology.

The Problem Explained

The URL we will invoke is http://ipadress:8080/createdeadlock. The code being invoked is the following:

@RequestMapping("/createdeadlock")
public void createDeadlock() {

  Object lock1 = new Object();
  Object lock2 = new Object();

  taskExecutor.execute(() -> {
    synchronized (lock1) {
      String threadName = Thread.currentThread().getName();
      System.out.println(threadName + " acquired lock on Lock1");

      try {
        Thread.sleep(10000);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }

      synchronized(lock2) {
        System.out.println(threadName + " acquired lock on Lock2");
      }
    }
  });

  taskExecutor.execute(() -> {
    synchronized (lock2) {
      String threadName = Thread.currentThread().getName();
      System.out.println(threadName + " acquired lock on Lock2");

      try {
        Thread.sleep(10000);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }

      synchronized(lock1) {
        System.out.println(threadName + " acquired lock on Lock1");
      }
    }
  });

}

After invoking the URL, two threads are started. One thread is acquiring a lock on object lock1 and the other on object lock2. After 10 seconds, the first thread tries to acquire a lock on object lock2 and the second thread tries to acquire a lock on object lock1. This will cause a deadlock, both threads are waiting on each other but never releasing the lock on the other object.

Start the Problem

Run the application with the following command:

java -jar myperformanceplanet-0.0.1-SNAPSHOT.jar

After successful start of the application, invoke the following URL to create the deadlock: http://ipaddress:8080/createdeadlock

In the console logging, the following lines are printed, showing us that the deadlock has occurred:

MyTaskExecutor-1 acquired lock on Lock1
MyTaskExecutor-2 acquired lock on Lock2

Invoking the URL again, does not print any statements in the logging anymore since the deadlock of the previous invocation is still going on. This means that we have simulated another kind of performance issue now: new invocations of the URL will actually do nothing.

Kernel Dominant or User Dominant?

First thing to check according to PDM is the CPU usage with vmstat 5, the output is:

$ vmstat 5 
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- 
r b    swpd   free  buff  cache     si  so    bi  bo     in   cs  us sy id wa st
0 0   33292 849260 21276 433996      0   1    82 149     20 1214  20  3 75  2  0
0 0   33292 849384 21284 434040      0   0     3  13     19 1622  23  4 72  1  0
2 0   33292 854960 21292 434896      0   0   172  12     19  751  11  2 83  3  0
0 0   33292 855500 21292 434912      0   0     3   0     19  572   8  2 90  1  0

We notice that our user and system CPU usage is very low. This means that our problem is situated in the no dominator section.

Check on Waiting Threads

No dominator means that some thread is probably waiting for an external system or a deadlock has occurred. We return to VisualVM to help us analyzing this issue. We take a look at the Threads tab.

visualvm-thread-deadlock

VisualVM has already detected the deadlock. The Thread Dump shows us the following:

"MyTaskExecutor-1" #33 prio=5 os_prio=0 cpu=60,04ms elapsed=972,20s tid=0x00007ff7a811b800 nid=0x5657 waiting for monitor entry [0x00007ff7b22c5000]
  java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mydeveloperplanet.myperformanceplanet.PerformanceController.lambda$createDeadlock$2(PerformanceController.java:110)
    - waiting to lock <0x00000000ec5d7370> (a java.lang.Object)
    - locked <0x00000000ec5d7380> (a java.lang.Object)
    at com.mydeveloperplanet.myperformanceplanet.PerformanceController$$Lambda$410/0x0000000100509840.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.1/ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.1/ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(java.base@11.0.1/Thread.java:834)

  Locked ownable synchronizers:
  - <0x00000000ec5d73c0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"MyTaskExecutor-2" #34 prio=5 os_prio=0 cpu=55,56ms elapsed=972,20s tid=0x00007ff7a8117800 nid=0x5658 waiting for monitor entry [0x00007ff7b33d1000]
  java.lang.Thread.State: BLOCKED (on object monitor)
    at com.mydeveloperplanet.myperformanceplanet.PerformanceController.lambda$createDeadlock$3(PerformanceController.java:127)
    - waiting to lock <0x00000000ec5d7380> (a java.lang.Object)
    - locked <0x00000000ec5d7370> (a java.lang.Object)
    at com.mydeveloperplanet.myperformanceplanet.PerformanceController$$Lambda$411/0x0000000100509c40.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.1/ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.1/ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(java.base@11.0.1/Thread.java:834)

  Locked ownable synchronizers:
  - <0x00000000ec5d7598> (a java.util.concurrent.ThreadPoolExecutor$Worker)

This gives us the information that both threads are blocked and also gives us the location where the lock occurs.

Conclusion

Does the PDM help us in order to detect a deadlock? The answer is yes. Again, this was a simple example, but following the PDM scheme, we were able to detect the deadlock and that was exactly what we wanted to verify.

Summary

Our goal in this second part of the post was to verify whether the PDM can work by using some simple performance examples. Following the PDM, we were able to detect the cause of the problems. Of course, applying the methodology in a real world situation will be more difficult, but with these examples we were able to get acquainted with the tooling and the methodology.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.