Why collect garbage?
Once an application has been running for a while, it will have allocated and deallocated many memory chunks from the heap, which leaves numerous holes in the heap. Eventually, a request will be made that results in the Next Object pointer moving off the top of the managed heap. When this happens, .NET can’t allocate any more memory from the top of the heap, even though the requested amount of memory might exist elsewhere in the heap. To resolve this condition, the CLR will force a garbage collection to remove all the unwanted blocks and reshuffle the remaining, still allocated, blocks into a contiguous stretch. The process is similar to defragmenting a hard disk.
Garbage collection is rendered possible because .NET keeps track of all allocated memory that’s still in use, which it does by keeping root pointers that point to active blocks – when a block of memory is no longer needed, its root pointer gets zeroed. Hence, to perform a collection, .NET has to obtain all the active memory roots and then walk through all the blocks they point to looking for references to other memory objects. Eventually, the garbage collection process will have created a graph of all the active memory blocks currently in use by the application and can use it to compact these blocks into a single, contiguous memory region. The garbage collection also has to adjust all existing memory pointers so that each application still knows where to access its allocated blocks at their new location.
To simplify and optimise this process, .NET maintains three generations of compacted data (referred to as Gen0, Gen1 and Gen2). The first time the garbage collection runs, all memory blocks are put in Gen0. Following this first run, any further active blocks get promoted to Gen1. Next time the garbage collection runs, it can either do a complete pass of the entire heap or else just compress what’s in Gen0. And after the next full run, objects in Gen1 get promoted to Gen2, while the newest allocated memory in Gen0 gets promoted to Gen1. This is a neat algorithm, since over time the contents of Gen2 will tend to remain pretty stable while Gen0 is more dynamic. Gen2 ends up with those long-lived objects opened by an application at start-up, which are in use until the application shuts down, so its contents tend to be less suitable candidates for much compaction. The contents of Gen0 on the other hand tend to be more dynamic, which gives more opportunities for compaction. While some of Gen0 may in due course be promoted to Gen1 or 2, much of Gen0 contains short-lived working variables.
Breaking up the garbage collection process in this way allows .NET to run a collection simply over Gen0 to compact the most volatile part of the heap, while running Gen1 and full garbage collections less frequently. This strategy works well in most cases, since the contents of Gen2 tend to change less and running a garbage collection over them is correspondingly less rewarding for the work involved.
Microsoft has implemented three different garbage collectors: a workstation version that is single threaded, a multithreaded version for systems with multiple CPUs and a third for the .NET Compact framework. Since the application has to halt while .NET performs a garbage collection, on a multiprocessor system .NET can use all the CPUs to check parts of the managed heap in parallel, thus speeding up the checking, which improves the performance of the garbage collection over the single-threaded version. At the other end of the scale, the garbage collection algorithm on the .NET Compact Framework is simpler, to minimise the overhead.