Overview of memory management
The Garbage Collector allocates areas of storage in the heap. These areas of storage define Java™ objects. When allocated, an object continues to be live while a reference (pointer) to it exists somewhere in the JVM; therefore the object is reachable. When an object ceases to be referenced from the active state, it becomes garbage and can be reclaimed for reuse. When this reclamation occurs, the Garbage Collector must process a possible finalizer and also ensure that any internal JVM resources that are associated with the object are returned to the pool of such resources.
Every allocation requires a heap lock to be acquired to prevent concurrent thread access. To optimize this allocation, particular areas of the heap are dedicated to a thread, known as the TLH (thread local heap), and that thread can allocate from its TLH without the need to lock out other threads. This technique delivers the best possible allocation performance for small objects. Objects are allocated directly from a thread local heap. A new object is allocated from this cache without the need to grab the heap lock. All objects less than 512 bytes (768 bytes on 64-bit JVMs) are allocated from the cache. Larger objects are allocated from the cache if they can be contained in the existing cache. This cache is often referred to as the thread local heap or TLH.
The active state of the JVM is made up of the set of stacks that represents the threads, the statics that are inside Java™ classes, and the set of local and global JNI references. All functions that are invoked inside the JVM itself cause a frame on the thread stack. This information is used to find the roots. A root is an object which has a reference to it from outside the heap. These roots are then used to find references to other objects. This process is repeated until all reachable objects are found.
When the JVM cannot allocate an object from the heap because of lack of contiguous space, a memory allocation fault occurs, and the Garbage Collector is invoked. The first task of the Garbage Collector is to collect all the garbage that is in the heap. This process starts when any thread calls the Garbage Collector either indirectly as a result of allocation failure, or directly by a specific call to System.gc(). The first step is to acquire exclusive control on the virtual machine to prevent any further Java™ operations. Garbage collection can then begin.
Heap sizing problems
For the majority of applications, the default settings work well. The heap expands until it reaches a steady state, then remains in that state, which should give a heap occupancy (the amount of live data on the heap at any given time) of 70%. At this level, the frequency and pause time of garbage collection should be acceptable.
For some applications, the default settings might not give the best results. Listed here are some problems that might occur, and some suggested actions that you can take. Use verbose:gc to help you monitor the heap.
The frequency of garbage collections is too high until the heap reaches a steady state.
Use verbose:gc to determine the size of the heap at a steady state and set -Xms to this value.
The heap is fully expanded and the occupancy level is greater than 70%.
Increase the -Xmx value so that the heap is not more than 70% occupied, but for best performance try to ensure that the heap never pages. The maximum heap size should, if possible, be able to be contained in physical memory to avoid paging.
At 70% occupancy the frequency of garbage collections is too great.
Change the setting of -Xminf. The default is 0.3, which tries to maintain 30% free space by expanding the heap. A setting of 0.4, for example, increases this free space target to 40%, and reduces the frequency of garbage collections.
Pause times are too long.
Try using -Xgcpolicy:optavgpause. This reduces the pause times and makes them more consistent when the heap occupancy rises. It does, however, reduce throughput by approximately 5%, although this value varies with different applications.
Here are some useful tips:
• Ensure that the heap never pages; that is, the maximum heap size must be able to be contained in physical memory.
• Avoid finalizers. You cannot guarantee when a finalizer will run, and often they cause problems. If you do use finalizers, try to avoid allocating objects in the finalizer method. A verbose:gc trace shows whether finalizers are being called.
• Avoid compaction. A verbose:gc trace shows whether compaction is occurring. Compaction is usually caused by requests for large memory allocations. Analyze requests for large memory allocations and avoid them if possible. If they are large arrays, for example, try to split them into smaller arrays. (Consider using LinkedList instead of ArrayList. -sg)
Heap lock allocation
If the Garbage Collector cannot find a big enough chunk of free storage, allocation fails and the Garbage Collector must perform a garbage collection. After a garbage collection cycle, if the Garbage Collector created enough free storage, it searches the freelist again and picks up a free chunk. The heap lock is released either after the object has been allocated, or if not enough free space is found. If the Garbage Collector does not find enough free storage, it returns OutOfMemoryError.
Cache allocation is specifically designed to deliver the best possible allocation performance for small objects. Objects are allocated directly from a thread local allocation buffer that the thread has previously allocated from the heap. A new object is allocated from this cache without the need to grab the heap lock; therefore, cache allocation is very efficient.
All objects less than 512 bytes (768 bytes on 64-bit JVMs) are allocated from the cache. Larger objects are allocated from the cache if they can be contained in the existing cache; if not a locked heap allocation is performed.
Large Object Area (LOA)
The LOA is an area of the tenure area of the heap set used solely to satisfy allocations for large objects that cannot be satisfied in the main area of the tenure heap, which is referred to in the rest of this section as the small object area (SOA). A large object in this context is considered to be any object 64 KB or greater in size; allocations for new TLH objects are not considered to be large objects. The large object area is allocated by default for all GC polices except -Xgcpolicy:subpool (for AIX®, Linux® PPC and zSeries®, z/OS®, and i5/OS®) but, if it is not used, it is shrunk to zero after a few collections. It can be disabled explicitly by specifying the -Xnoloa command-line option.
Initialization and the LOA
The LOA boundary is calculated when the heap is initialized, and recalculated after every garbage collection. The size of the LOA can be controlled using command-line options: -Xloainitial, -Xloaminimum, and -Xloamaximum.
All options take values between 0 and 0.95 (0% thru 95% of the current tenure heap size). If not specified, the default minimum size of the LOA is 0% of the current tenure area and default maximum size is 50%. The default initial size is 5% of the current tenure area.
Expansion and shrinkage of the LOA
The Garbage Collector uses the following algorithm:
• If an allocation failure occurs in the SOA:
o If the current size of the LOA is greater than its initial size and if the amount of free space in the LOA is greater than 70%, reduce by 1% the percentage of space that is allocated to the LOA.
o If the current size of the LOA is equal to or less than its initial size, and if the amount of free space in the LOA is greater than 90%:
If the current size of the LOA is greater than 1% of the heap, reduce by 1% the percentage of space that is allocated to the LOA.
If the current size of the LOA is 1% or less of the heap, reduce by 0.1%, the percentage of space that is allocated to the LOA.
• If an allocation failure occurs on the LOA:
o If the size of the allocation request is greater than 20% of the current size of the LOA, increase the LOA by 1%.
o If the current size of the LOA is less than its initial size, and if the amount of free space in the LOA is less than 50%, increase the LOA by 1%.
o If the current size of the LOA is equal to or greater than its initial size, and if the amount of free space in the LOA is less than 30%, increase the LOA by 1%.
Allocation in the LOA
When allocating an object, the allocation is first attempted in the SOA. If it is not possible to find a free entry of sufficient size to satisfy the allocation, and the size of the request is equal to or greater than 64 KB, the allocation is retried in the LOA. If the size of the request is less than 64 KB or insufficient contiguous space exists in the LOA, an allocation failure is triggered
Detailed description of garbage collection
The first step in garbage collection is to acquire exclusive control on the Virtual machine to prevent any further Java™ operations. Garbage collection then goes through the three phases: mark, sweep, and, if required, compaction. The IBM® Garbage Collector is a stop-the-world (STW) operation, because all application threads are stopped while the garbage is collected.
On completion of the mark phase the mark bit vector identifies the location of all the live objects in the heap.
In mark phase, all the live objects are marked. Because unreachable objects cannot be identified singly, all the reachable objects must be identified. Therefore, everything else must be garbage.
Parallel mark: The goal of parallel mark is to not degrade mark performance on a uniprocessor, and to increase typical mark performance on a multiprocessor system. Object marking is increased through the addition of helper threads. By default, a platform with n processors also has n-1 new helper threads. You can override the default number of threads by using the -Xgcthreads option. If you specify a value of 1, there will be no helper threads. The -Xgcthreads option accepts any value greater than 0, but clearly you gain nothing by setting it to more than N-1.
Concurrent mark: Concurrent mark gives reduced and consistent garbage collection pause times when heap sizes increase. It starts a concurrent marking phase before the heap is full. In the concurrent phase, the Garbage Collector scans the roots, i.e. stacks, JNI references, class statics, etc. The stacks are scanned by asking each thread to scan its own stack. These roots are then used to trace live objects concurrently. Tracing is done by a low-priority background thread and by each application thread when it does a heap lock allocation. An stop-the-world (STW) collection is started when one of the following occurs:
• An allocation failure
• A System.gc
• Concurrent mark completes all the marking that it can do
Reduced and consistent pause times are the benefits of concurrent mark, but they come at a cost. Application threads must do some tracing when they are requesting a heap lock allocation. The overhead varies depending on how much idle CPU time is available for the background thread. Also, the write barrier has an overhead.
The -Xgcpolicy command-line parameter is used to enable and disable concurrent mark:
The -Xgcpolicy options have these effects:
Disables concurrent mark. If you do not have pause time problems (as seen by erratic application response times), you get the best throughput with this option. Optthruput is the default setting.
Enables concurrent mark with its default values. If you are having problems with erratic application response times that are caused by normal garbage collections, you can reduce those problems at the cost of some throughput, by using the optavgpause option.
Requests the combined use of concurrent and generational GC to help minimize the time that is spent in any garbage collection pause.
Disables concurrent mark. It uses an improved object allocation algorithm to achieve better performance when allocating objects on the heap. This option might improve performance on SMP systems with 16 or more processors. The subpool option is available only on AIX®, Linux® PPC and zSeries®, z/OS®, and i5/OS®.
On completion of the mark phase the mark bit vector identifies the location of all the live objects in the heap.
The sweep phase uses this to identify those chunks of heap storage that can be reclaimed for future allocations; these chunks are added to the pool of free space.
To avoid filling this free space pool with lots of small chunks of storage, only chunks of at least a certain size are reclaimed and added to the free pool. The minimum size for a free chunk is currently defined as 512 bytes (768 bytes on 64-bit platforms)
The small areas of storage that are not on the freelist are known as “dark matter”, and they are recovered when the objects that are next to them become free, or when the heap is compacted..
Parallel bitwise sweep: Parallel bitwise sweep improves the sweep time by using available processors. In parallel bitwise sweep, the Garbage Collector uses the same helper threads that are used in parallel mark, so the default number of helper threads is also the same and can be changed with the -Xgcthreads option.
Concurrent sweep: Like concurrent mark, concurrent sweep gives reduced garbage collection pause times when heap sizes increase. Concurrent sweep starts immediately after a stop-the-world (STW) collection, and must at least complete a certain subset of its work before concurrent mark is allowed to kick off, because the mark map used for concurrent mark is also used for sweeping. To enable concurrent sweep, use the -Xgcpolicy: parameter optavgpause. It is active in conjunction with concurrent mark. The modes optthruput, gencon, and subpool do not support concurrent sweep.
When the garbage has been removed from the heap, the Garbage Collector can consider compacting the resulting set of objects to remove the spaces that are between them. The process of compaction is complicated because, if any object is moved, the Garbage Collector must change all the references that exist to it. The default is not to compact.
To keep compaction times to a minimum, the helper threads are used again.
Compaction occurs if any one of the following is true and -Xnocompactgc has not been specified:
• -Xcompactgc has been specified.
• Following the sweep phase, not enough free space is available to satisfy the allocation request.
• A System.gc() has been requested and the last allocation failure garbage collection did not compact or -Xcompactexplicitgc has been specified.
• At least half the previously available memory has been consumed by TLH allocations (ensuring an accurate sample) and the average TLH size falls below 1024 bytes
• The scavenger is enabled, and the largest object that the scavenger failed to tenure in the most recent scavenge is larger than the largest free entry in tenured space.
• The heap is fully expanded and less than 4% of old space is free.
• Less than 128 KB of the active heap is free.
Subpool (AIX®, Linux® PPC and zSeries®, z/OS® and i5/OS® only)
Subpool is an improved GC policy for object allocation that is specifically targeted at improving the performance of object allocation on SMP systems with 16 or more processors. You invoke it with the -Xgcpolicy:subpool command-line option.
The subpool algorithm uses multiple free lists rather than the single free list used by optavgpause and optthruput. It tries to predict the size of future allocation requests based on earlier allocation requests. It recreates free lists at the end of each GC based on these predictions. While allocating objects on the heap, free chunks are chosen using a “best fit” method, as against the “first fit” method used in other algorithms. It also tries to minimize the amount of time for which a lock is held on the Java™ heap, thus reducing contention among allocator threads. Concurrent mark is disabled when subpool policy is used. Also, subpool policy uses a new algorithm for managing the Large Object Area (LOA).
During garbage collection, immediately following the mark phase, these lists are processed in a specific order:
Soft references are collected if their referent is not marked (if they are not alive) and if #get() has not been called on the reference object within the previous 32 garbage collection cycles.
Weak and phantom references are always collected if their referent is not marked.
An element is eligible for processing if it is marked and has a non-null referent field. For each element on a list, GC determines if the reference object is eligible for processing and then if it is eligible for collection.
If an element is not eligible for processing, the reference object is removed from the reference list, resulting in it being freed during the sweep phase.
If an element is determined to be eligible for processing, GC must determine if it is eligible for collection.
The first criterion here is simple. Is the referent marked? If it is marked, the reference object is not eligible for collection. and GC moves onto the next element of the list.
If the referent is not marked, GC has a candidate for collection.
At this point the process differs for each reference type. Soft references are collected if their referent has not been marked for the previous 32 garbage collection cycles. This can be adujsted by -Xsoftrefthreshold option.
If there is a shortage of available storage, all soft references are cleared. All soft references are guaranteed to have been cleared before the OutOfMemoryError is thrown.
enqueue: To place an item in a queue.
referent: An instance of Refrence (subclasses) encapsulates a single reference to a particular object, called the referent.
Every reference object provides methods for getting and clearing the reference. Aside from the clearing operation reference objects are otherwise immutable, so no set operation is provided.
reference-object classes, which support a limited degree of interaction with the garbage collector.
Soft references are for implementing memory-sensitive caches.
Weak references are for implementing canonicalizing mappings that do not prevent their keys (or values) from being reclaimed.
Phantom references are for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.
Reference Queue to which registered reference objects are appended by the garbage collector after the appropriate reachability changes are detected.
Going from strongest to weakest, the different levels of reachability reflect the life cycle of an object. They are operationally defined as follows:
An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it.
An object is softly reachable if it is not strongly reachable but can be reached by traversing a soft reference.
An object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for finalization.
An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it.
Finally, an object is unreachable, and therefore eligible for reclamation, when it is not reachable in any of the above ways.
A program may request to be notified of changes in an object’s reachability by registering an appropriate reference object with a reference queue at the time the reference object is created.
Some time after the garbage collector determines that the reachability of the referent has changed to the value corresponding to the type of the reference, it will add the reference to the associated queue. At this point, the reference is considered to be enqueued.
The program may remove references from a queue either by polling or by blocking until a reference becomes available. Reference queues are implemented by the ReferenceQueue class.
The relationship between a registered reference object and its queue is one-sided. That is, a queue does not keep track of the references that are registered with it. If a registered reference becomes unreachable itself, then it will never be enqueued. It is the responsibility of the program using reference objects to ensure that the objects remain reachable for as long as the program is interested in their referents.
While some programs will choose to dedicate a thread to removing reference objects from one or more queues and processing them, this is by no means necessary. A tactic that often works well is to examine a reference queue in the course of performing some other fairly-frequent action. For example, a hashtable that uses weak references to implement weak keys could poll its reference queue each time the table is accessed. This is how the WeakHashMap class works. Because the ReferenceQueue.poll method simply checks an internal data structure, this check will add little overhead to the hashtable access methods.
From java.lang.ref. Reference.java comments:
A Reference instance is in one of four possible internal states:
Active: Subject to special treatment by the garbage collector. Some time after the collector detects that the reachability of the referent has changed to the appropriate state, it changes the instance’s state to either Pending or Inactive, depending upon whether or not the instance was registered with a queue when it was created. In the former case it also adds the instance to the pending-Reference list. Newly-created instances are Active.