Monitoring the JVM Heap with JRuby
The JVM provides a number of management and monitoring interfaces that you, as a JRuby user, can use to keep an eye on or profile your application. There are interfaces for monitoring threads, monitoring memory usage, monitoring loaded JVM classes, and even for monitoring JRuby internals. The full set of built-in monitors are documented in the JDK’s java.lang.management docs.
In this article, we’ll explore some of the memory monitoring beans and how you can use them to analyze the memory characteristics of your application. I’ll be using the default Java 6 version for OS X 10.6.3, which is based on Hotspot, the JVM inside OpenJDK.
We start by requiring ‘java’ and importing the ManagementFactory from the JDK.
(require 'java'
import java.lang.management.ManagementFactory
)
Now we have access to the four primary beans associated with memory management on the JVM: GarbageCollector, MemoryManager, Memory, and MemoryPool MXBeans. We’ll grab them all first.
gc_beans = ManagementFactory.garbage_collector_mxbeans
memman_beans = ManagementFactory.memory_manager_mxbeans
mem_bean = ManagementFactory.memory_mxbean
mempool_beans = ManagementFactory.memory_pool_mxbeans
Let’s start with the simplest of these beans, the Memory bean.
The Memory bean provides the most basic information: current heap usage, current non-heap usage, and the count of objects waiting to be finalized. There’s also methods to force a GC run and to enable verbose GC logging. Let’s check current stats, run some code with verbose logging on, and force a GC run:
puts "Current heap usage: #{mem_bean.heap_memory_usage}"
puts "Current non-heap usage: #{mem_bean.non_heap_memory_usage}"
puts "Waiting to finalize: #{mem_bean.object_pending_finalization_count}"
puts "Enabling verbose GC logging..."
mem_bean.verbose = true
1_000_000.times { "foo" + "bar" }
puts "Forcing a full GC run..."
mem_bean.gc
mem_bean.verbose = false
puts "Disabling verbose GC logging"
puts
Here’s the output so far:
Current heap usage: init = 268435456(262144K) used = 1960416(1914K) committed = 270073856(263744K) max = 520093696(507904K)
Current non-heap usage: init = 12750848(12452K) used = 13608880(13289K) committed = 21921792(21408K) max = 100663296(98304K)
Waiting to finalize: 0
Enabling verbose GC logging...
[GC 17029K->1642K(263744K), 0.0014821 secs]
[GC 17066K->1653K(263744K), 0.0122954 secs]
[GC 17077K->1654K(263744K), 0.0012559 secs]
[GC 17078K->1654K(263744K), 0.0006338 secs]
[GC 17078K->1658K(263744K), 0.0050983 secs]
[GC 17082K->1646K(263744K), 0.0003545 secs]
[GC 17070K->1646K(263744K), 0.0005288 secs]
[GC 17070K->1644K(263744K), 0.0006449 secs]
[GC 17068K->1644K(263744K), 0.0090768 secs]
Forcing a full GC run...
[Full GC 11674K->1621K(263744K), 0.0617646 secs]
Disabling verbose GC logging
(The GarbageCollector beans provide information about all the garbage collectors in use on a given JVM. They allow you to query collection counts and total collection time.
Below, we’ll do some work that should trigger at least a couple GC runs, and then print out stats.
)
1_000_000.times { "foo" + "bar" }
total_collections = 0
total_time = 0.0
for gc_bean in gc_beans
puts "Name: #{gc_bean.name}"
puts " Pools: #{gc_bean.memory_pool_names.to_a.join(', ')}"
puts " Collections: #{gc_bean.collection_count}"
puts " Time: #{gc_bean.collection_time}"
total_collections += gc_bean.collection_count
total_time += gc_bean.collection_time
end
puts "Total collections: #{total_collections}"
puts "Total time: #{total_time}"
puts
And here’s the output for my JVM:
Name: ParNew
Pools: Par Eden Space, Par Survivor Space
Collections: 18
Time: 38
Name: ConcurrentMarkSweep
Pools: Par Eden Space, Par Survivor Space, CMS Old Gen, CMS Perm Gen
Collections: 2
Time: 149
Total collections: 20
Total time: 187.0
You may notice that each garbage collector has several memory pools associated with it. Let’s look at the memory pools now.
Memory pools represent areas of memory used to isolate objects with differing characteristics. For example, some JVMs will list a “Par Eden Space”, which represents the memory pool containing the newest and youngest objects. Some JVMs list a “Perm Gen” which will contain objects only collected during full GC runs.
Let’s iterate over all memory pools and get information on each:
for pool_bean in mempool_beans
puts "Name: #{pool_bean.name}"
puts " Type: #{pool_bean.type}"
puts " Peak usage: #{pool_bean.peak_usage}"
puts " Current usage: #{pool_bean.usage}"
puts " Usage after last collection: #{pool_bean.collection_usage}"
puts " Managers: #{pool_bean.memory_manager_names.to_a.join(', ')}"
end
puts
Here’s the listing of memory pools for me:
Name: Code Cache
Type: Non-heap memory
Peak usage: init = 167936(164K) used = 1004992(981K) committed = 1019904(996K) max = 33554432(32768K)
Current usage: init = 167936(164K) used = 1004992(981K) committed = 1019904(996K) max = 33554432(32768K)
Usage after last collection:
Managers: CodeCacheManager
Name: Par Eden Space
Type: Heap memory
Peak usage: init = 12582912(12288K) used = 15794176(15424K) committed = 15794176(15424K) max = 25165824(24576K)
Current usage: init = 12582912(12288K) used = 10244904(10004K) committed = 15794176(15424K) max = 25165824(24576K)
Usage after last collection: init = 12582912(12288K) used = 0(0K) committed = 15794176(15424K) max = 25165824(24576K)
Managers: ConcurrentMarkSweep, ParNew
Name: Par Survivor Space
Type: Heap memory
Peak usage: init = 2097152(2048K) used = 49824(48K) committed = 2621440(2560K) max = 4194304(4096K)
Current usage: init = 2097152(2048K) used = 2048(2K) committed = 2621440(2560K) max = 4194304(4096K)
Usage after last collection: init = 2097152(2048K) used = 2048(2K) committed = 2621440(2560K) max = 4194304(4096K)
Managers: ConcurrentMarkSweep, ParNew
Name: CMS Old Gen
Type: Heap memory
Peak usage: init = 251658240(245760K) used = 1682040(1642K) committed = 251658240(245760K) max = 490733568(479232K)
Current usage: init = 251658240(245760K) used = 1662536(1623K) committed = 251658240(245760K) max = 490733568(479232K)
Usage after last collection: init = 251658240(245760K) used = 1659912(1621K) committed = 251658240(245760K) max = 490733568(479232K)
Managers: ConcurrentMarkSweep
Name: CMS Perm Gen
Type: Non-heap memory
Peak usage: init = 12582912(12288K) used = 12861856(12560K) committed = 21250048(20752K) max = 67108864(65536K)
Current usage: init = 12582912(12288K) used = 12861856(12560K) committed = 21250048(20752K) max = 67108864(65536K)
Usage after last collection: init = 12582912(12288K) used = 12749352(12450K) committed = 20967424(20476K) max = 67108864(65536K)
Managers: ConcurrentMarkSweep
Note that the “usage” values are actually instances of MemoryUsage, which provides four different values for the pool:
- init: the amount of memory the JVM initially reserved from the OS
- used: the amount of memory currently used
- committed: the amount of memory actually available
- max: the maximum memory that will ever be available
These will change over time as objects are collected and the heap grows to meet application needs. Let’s show the current usage for the Par Eden Space, as an example:
par_eden = mempool_beans.find {|b| b.name == "Par Eden Space"}
usage = par_eden.usage
puts "Initially requested: #{usage.init}"
puts "Currently used: #{usage.used}"
puts "Total available: #{usage.committed}"
puts "Max potential: #{usage.max}"
puts
The output for “Par Eden Space”:
Initially requested: 12582912
Currently used: 10575384
Total available: 15794176
Max potential: 25165824
Memory pools also allow you to set usage thresholds. When a threshold is crossed, the JVM will increment a counter. This allows you to monitor whether or not your application is regularly exceeding a particular threshold, so you can adjust memory settings or work to reduce your application’s demands.
Here’s what that might look like for the “Par Survivor Space,” the “old” pool. We’ll monitor the collection usage threshold, or the amount of space in use after the most recent GC run. Some JVMs will support this or the general usage threshold, but neither are required to be supported.
We’ll set a limit that’s 5MB larger than the last collection usage and then run another loop that keeps memory alive a bit longer.
par_old = mempool_beans.find {|b| b.name == "Par Survivor Space"}
if par_old.collection_usage_threshold_supported?
par_old.collection_usage_threshold =
par_old.collection_usage.used + 5_000
1_000.times do
ary = []
10_000.times do
ary << "foo" + "bar";
end
end
puts "Crossed it: #{par_old.collection_usage_threshold_count} times"
end
And the result:
Crossed it: 104 times
Crossing the threshold here means that the minimum usage of the pool after a collection went over this size that many times. In this example, we create a 10,000 element array 1000 times and then throw it away; but while it’s alive, it gets promoted into this “Survivor” space. So we repeatedly create it, promote it, and release it, resulting in the threshold being crossed several times.
The final bean we haven’t covered, the MemoryManager bean, is usually just the superclass for other beans like GarbageCollector, so there’s not much more to say about them. On my system, the list of MemoryManager beans is the same as the list of GarbageCollector beans.
Hopefully this walkthrough has whetted your appetite for the JVM’s management and monitoring interfaces. In future posts, we’ll cover how you can use the other builtin monitors, how you can track JRuby’s internal monitors, and even how you can register your own monitors for any application. As always, leave questions and comments!
Share your thoughts with @engineyard on Twitter