G1 in Action: Is it better than the CMS?
I first heard about Garbage First (G1) collector on a presentation given by my college Thomas Kluge few months ago. He also wrote a nice post about G1, so if you need some introduction to topic I strongly recommend reading the Garbage First (G1) – Wenn der Müll im Mittelpunkt steht here on NovaTec Blog.
I was immediately bought by the big words like “new promising implementation“, “large heaps and short pauses“, etc. I wanted to give it a try right away, I was tempted to see if the Java world really got the garbage collection that can “rule them all“. Thus, Thomas and I decided to spend one complete day on testing the G1 and to see what differences it will bring in comparison to the Concurrent Mark Sweep (CMS) collector.
Choosing of an Application
We were sure that we have a perfect application to perform tests on, the one that is developed in-house by us – inspectIT. Our free performance diagnose tool has a Central Management Repository (CMR) component, that stands as the server component of the inspectIT architecture. The component needs to handle a large amount of data received from inspectIT agents and it holds majority of that data in the in-memory buffer. To fit as much data as possible in this classical FIFO buffer, we usually start the CMR with the max heap size of 6GB and higher. It’s usual that under load CMR receives gigabytes of data in matter of minutes. When the buffer is full kicking-out of the oldest data will start, which will effectively make need for a lot of garbage collection. Thus, we believed that all requirements for G1 to show all its power were met:
- have a large heap size
- have more than 50% of the Java heap is occupied with live data (suggested by Orcale G1 guide)
- expect long garbage collection pauses (due to the fact that under heavy load several megabytes of data need to be collected)
The tests were performed with two physical machines. On the first machine (QuadCore, 16GB RAM) the inspectIT CMR was running occupying the total of 8GB heap space (which makes the in-memory buffer 4.5GB). The second machine (QuadCore, 8GB RAM) was used to generate load which consisted of simply sending data to the CMR. It’s important to mention that amount of data sent to the CMR was around 1GB per minute, meaning that when buffer is full same amount of data needs to be garbage collected per minute.
The JVM version in all tests was the same – 1.7.0_25 x64.
What about G1 Settings
Next step was to start testing different G1 settings and see which combination performs best with our setup. We combined different values for Region Size, Reserve Percent, Heap Occupancy Percent, New & Survivor Ratio, etc. We were interested in the setting that will give us the highest throughput with preferably no Full-GC pauses. After trying out 12 different combinations each in a 15 minute test, the following one was chosen for the battle with the CMS:
- Region size: 8MB
- Reserve percent: 15%
- Max GC Pause: 2 sec
- New ratio: 3
- Survivor ration: 5
- Other settings were default ones
We noticed that all tuning we did brought very little change in the results and from that we concluded that it’s quite hard to change the behavior of G1 collector.
To give both collectors chance to show their good and bad sides, we decided to perform two 24 hours long tests, one with G1 as the collector (with chosen settings) and second with CMS as the collector (with default settings defined by the inspectIT team). Here are the full JVM options used for the both tests:
-Xms8192m -Xmx8192m -XX:MaxPermSize=192m -XX:PermSize=128m -XX: UseG1GC -XX:MaxGCPauseMillis=2000 -XX:G1HeapRegionSize=8 -XX:G1ReservePercent=15 -XX:NewRatio=3 -XX:SurvivorRatio=5 -XX: AggressiveOpts -XX: UseFastAccessorMethods -XX: UseBiasedLocking -XX: UseCompressedOops -XX: HeapDumpOnOutOfMemoryError -server -verbose:gc -XX: PrintGCTimeStamps -XX: PrintGCDetails -XX: PrintTenuringDistribution -Xloggc:logs/gc.log
-Xms8192m -Xmx8192m -Xmn2048M -XX:MaxPermSize=192m -XX:PermSize=128m -XX: UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX: UseCMSInitiatingOccupancyOnly -XX: UseParNewGC -XX: CMSParallelRemarkEnabled -XX: DisableExplicitGC -XX:SurvivorRatio=5 -XX:TargetSurvivorRatio=90 -XX: AggressiveOpts -XX: UseFastAccessorMethods -XX: UseBiasedLocking -XX: UseCompressedOops -XX: HeapDumpOnOutOfMemoryError -server -verbose:gc -XX: PrintGCTimeStamps -XX: PrintGCDetails -XX: PrintTenuringDistribution -Xloggc:logs/gc.log
After the machines have been working for 2 complete days, we got the following results:
Total Run Time
Number of Pauses
Number of Full GC
Avg Full GC
Max Full GC
From the above it is clear that the CMS heavily defeated the G1. Not only that the throughput was much higher, but the CMS was fast enough not to make any Full Garbage Collection needed. On the other hand, seams like the G1 was too slow and 41 “stop-of-the-world” pauses occurred, each lasting for 20 seconds in average.
The difference in number of pauses is also high and it’s not easy to conclude why G1 needed almost 6 times more pauses.
First of all don’t believe the big words, test it on your own and try to prove which collector fits bets for your environment. Even when you decide for one collector, make sure that you test it with different settings, it can make a big difference. For example, the default settings for the G1 showed worst results in our tests.
Does these results mean that CMS is better choice? Of course not. In our case it turned out that the CMS does perform better and we will keep it as the default collector for our application. In fact, we were very positively surprised by the throughput of 95% and we think this is a great result. However, maybe the G1 would perform better in different circumstances. The G1 garbage collection is quite new, thus we yet expect it to achieve maturity.
Our goal is to try to go on and test the G1 with some other applications too. We want to see in which cases it shows to be better choice and why does the tuning of its settings brings so little change to its behavior. As soon as we have new results, we will be happy to present them here on NovaTec Blog.
Interesting results! Despite being marketed for large heaps, I personally found that G1 is fantastic for small heap sizes. My application is soft real-time and maximum pause time is about 20ms. We prefer using many low-end servers instead of a high-end one because the costs tend to be lower that way. CMS was terrible and had a really bad worst case. To handle this, I restricted my Xmx to 128MB (!) and used G1. Guess what: now the gc pauses are never greater than 15ms!30. July 2018 |
You’re not documenting why the full GCs are occurring with the G1 collector, which is designed to not have any full GCs, ever. There were bugs around CodeCache filling up that could lead to unexpected full GCs, so perhaps that’s what you were running into.21. December 2016 |