An Overview on Garbage Collector Algorithms

Before we move into the details of algorithms that are used in Java Virtual Machine (JVM), let’s have a common understanding of what GC Roots are. These are the objects which are directly accessible from outside the heap memory.
(1) Active Threads
(2) Static Variables
(3) Local Variables (Accessible via Stack of a Thread)
(4) JNI References

There are three basic types of algorithms that are used in Java Virtual Machine (JVM):-
(1) Mark-sweep
(2) Mark-sweep-compact
(3) Mark-copy

Mark:- All of the algorithms discussed have the same mark phase. Marking phase is about traversing the whole object graph, starting from GC Roots. When GC visits the object, it marks it as accessible and thus alive. All the objects which are not reachable from GC Roots are garbage. Marking requires Stop-The-World (STW) pauses, because the running application threads could interfere. How long the STW pause is, depends mostly on the number of visited objects.

Mark-Sweep:-After marking phase, we have the memory space which is occupied by visited (accessible via GC Roots) and unvisited objects. Sweep phase releases the memory fragments which contains unreachable objects. It is simple, but because the dead objects are not necessarily next to each other, we end up having a fragmented memory. That’s not bad but trying to fit a too large object into the memory could potentially lead to OutOfMemoryError.

Mark-Sweep-Compact:-This algorithm fixes the problem with fragmented memory. After all alive objects are marked, they are moved to the beginning of the memory space. That helps to avoid having too fragmented (split) memory, but compacting the heap isn’t for free. Copying objects and updating all references to them take time and it all happens during STW pause.

Mark-Copy:- Mark-copy algorithm copies all alive objects to a new memory region. The previously occupied region is considered to be free. Next time mark-copy is executed, all the alive objects are moved back to the previous memory region. As you can imagine, this, of course, leads to a memory compaction. Unfortunately, it requires additional extra region large enough to fit all live objects at any given point in time.

Outlining Garbage Collectors

Java Virtual Machine (JVM) heap is divided into two different Generations. One is called Young Generation and the second one is Old Generation (sometimes referred to as Tenured). The Young Generation is further separated into two main logical sections: Eden and Survivor spaces. There are also Virtual spaces for both Young and the Old Generations which are used by Garbage Collectors to resize other regions – mainly to meet different GC goals.

Object Life Cycle
Objects start their journey in Eden space of the Young Generation. When Eden fills up, so called Minor GC is performed: all application threads are stopped (Stop-The-World Pause), objects which are not used anymore are discarded and all other objects from the Eden are moved to the first Survivor space (S0). Next time a Minor GC is performed, the objects goes from S0 to the second Survivor space (S1). All live objects from Eden goes to S1 as well. Notice that it leads to differently aged object in the Survivor space – we have objects from Eden and objects which were already in the Survivor space. Next iteration of Minor GC moves the objects from S1 back to the S0, so the Survivor spaces switch every GC. It is significant to understand the switching of objects in Survivor spaces because when object reaches certain age threshold, it is promoted to the Old Generation. It leads to Survivor space fragmentation (division) which can be easily eliminated with moving all objects from S0 to S1 and back every Minor GC. Eventually, when the Old Generation fills up, a Major GC will be performed on the Old Generation which cleans it up and compacts that space. If and how Stop-The-World pauses occur during Major GC depends on specific GC algorithm used.

Types of Garbage Collectors
Let’s have an overview on three types of garbage collectors:-

(1) Serial Collector: As the name suggests, the collection is performed by only one thread. Stop-the-world (STW) pauses are necessary during both Minor and Full GC. This collector uses the Mark-Copy algorithm for the Young Generation, whereas the Old Generation is cleaned up using the Mark-Sweep-Compact algorithm. Serial GC is designed for single-threaded environments (usually client-class machines) and for relatively small heaps.

(2) Parallel Collector: The Young collection is parallelized by multiple threads which makes Minor GC much faster. As a result, this collector leads to shorter, but more frequent Young collection STW pauses. Since JDK 7u4 [Java 7 update release 4], the Old Generation is also collected by multiple threads by default (and also causes stop-the-world pauses). This collector also uses the Mark-Copy algorithm in the Young Generation and Mark-Sweep-Compact in the Old Generation, but both copy and compact phases are executed by multiple threads.

(3) Concurrent Mark and Sweep Collector: Minor GC is performed with multiple threads using the parallel Mark-Copy algorithm. All application threads are stopped then. The Old Generation is mostly collected concurrently – application threads are paused for very short periods of time when the background GC thread scans the Old Generation. The actual algorithm used during Major GC is concurrent Mark-Sweep. Concurrent Mark and Sweep is the collector which doesn’t compact the Tenured space (Old Generation) and thus the memory can be left fragmented. Due to lack of heap compaction, when GC is not able to fit new objects into the memory, JVM fallbacks to the serial mark-sweep-compact algorithm to defragment and compact the Old Generation. That’s when performance degradation comes into play – all application threads are stopped and just one single thread is responsible for cleaning and compacting the Tenured space.

(4) G1GC (Garbage First Garbage Collector): Garbage First (G1) is a new low-pause garbage collector designed to process large heaps with minimal pauses. The heap is broken down into several regions of fixed size (while still maintaining the generational nature of the heap). That kind of design allows us to get rid of long STW pauses when the entire Young or Old Generations are processed. Now, each region can be collected separately which leads to shorter, but more frequent STW pauses. G1 copies objects from one region into another, which means that the heap is at least partially compacted. G1 uses an incremental version of the Mark-Sweep-Compact algorithm.

PermGen and MetaSpace
PermGen is an abbreviation for Permanent Generation and it’s a special heap space which is separate from the main Java heap where JVM keeps track of metadata of the classes which have been loaded. In Java 8, PermGen has been renamed to Metaspace – with some subtle differences. From dvelopment perspective, it is important to note that Metaspace has unlimited default maximum size. On the contrary, PermGen from Java 7 and earlier has the default maximum size of 64 MB on 32-bit JVM and 82 MB on 64-bit one.