July 9, 2009

Garbage Collection

MEMORY MANAGEMENT :

In .net the memory management is handled by the CLR by means of Garbage Collector automatically . it solves the developers problem of keeping track of all the references and later freeing them when they are not in use. Which is a tedious task is done manually and if not done accurately can lead to memory leaks.

The Garbage Collection Algorithm.
Note: Roots identify storage locations, which refer to the objects on the managed heap or to objects that are set to null.

This algorithm involves the two phases/steps (mark and compact).
In the mark phase , aim is to identify the memory to be claimed and mark them. As the collector starts running it starts from the root object (usually global or static object pointers) that is identified by the JIT compiler and then traverses to the object chain. It traces the objects and add to the graph.
Examples of root objects :
• Global and static object pointers
• Any local variable/parameter pointers on a thread’s stack.
• Any CPU registers containing pointers in the managed heap.
The list of the active root is maintained by the JIT compiler and CLR and is made accessible to the garbage collector.
The GC starts from the live references/roots and walk through all the objects building a graph of all the objects that are reachable. It stops to traverse the path as soon as it encounters an object that is already present in the graph. Hence, doesn’t go through the objects more then once, and prevents the infinite loops in case of circular linked lists of objects, thus handling cycles properly. Once all the roots are marked, the graph has the set of all the objects that somehow reachable, while all the other objects which are not reachable are considered as garbage.
Now comes the second phase into picture i.e, “compact” phase. Here all the live objects are moved to the bottom of the heap , leaving the contiguous free memory space. Moving the objects invalidate all pointers , so the garbage collector, modifies the application’s roots to point to the new locations. Also it takes into account those objects that hold pointers to other object and correct that.
After all the garbage has been identified, all the non-garbage has been compacted, and all the non-garbage pointers have been fixed-up, a pointer is positioned just after the last non-garbage object to indicate the position where the next object can be added.



As can be seen from the above figure , the objects B,C are not referenced by any of the roots. So they are considered as the garbage and will be compacted.
To improve the performance CLR classifies the memory heap in 3 generations . This classification is made on the basis of the fact that newly created object tend to have short life. While older the object longer it will survive. There is no predefined time for a garbage collector to run. When the memory heap reaches a threshold then only it runs to release memory from the unused / non-referenced resources.



GC Generations Fig A
When initialized the heap has no objects. All new objects added to the heap are in Generation 0, once the heap gets filled up, the GC is invoked. As most of the objects are short lived only small number will survive the first collection. Those which survive the collection are moved to the Generation 1 level. So we have compact free space in Generation 0. When the next time GC is invoked, again the new survived objects are promoted to generation 1 while the generation 1 objects are moved to the generation 2. So as the objects mature in there present generation they are promoted to the next generation with each garbage collection.

No comments: