Sunday, June 21, 2009

MEMORY LAYOUT FOR VIRTUAL FUNCTIONS

This article is for beginners who wanted to gain in-depth knowledge about how virtual function works?

I will not be explaining definition & usage of virtual function, which one can get from MSDN or other C++ books.

Now let’s start understanding how a simple class is been laid out in Memory:

class Base

{
private:
    int iBVar;
public:

    void bMehtod1()
    {
        cout<<”Base::bMethod1()”;
    }
    void bMehtod2()
    {
        cout<<”Base::bMethod2()”;
    }
};
The memory layout for Base is:

As shown in above figure, the object derived from Base is been laid out in memory according to above figure.

Now let’s modify above Base class, add two virtual functions into it:

class Base

{
private:
    int iBVar;
public:
    void bMehtod1()
    {
        cout<<”Base::bMethod1()”;
    }
    void bMehtod2()
    {
        cout<<”Base::bMethod2()”;
    }

    virtual void bMethod3()
    {
        cout<<”Base::virtual::bMethod3()”;
    }
    virtual void bMethod4()
    {
        cout<<”Base::virtual::bMethod4()”;
    }

};



As shown in above figure, a Virtual Table is been added in-between class & Code Segment that contains entries for pointer to virtual functions. Let’s look into newly added items

vptr (Virtual Pointer):
> it’s the pointer which contain address of Virtual Table.
> vptr is associated with object, each object is having a different vptr pointing to same VTable.
> vptr is been added to object’s memory space only if atleast one virtual function is declared in a class.
> vptr is always been the first private member automatically added in a class by compiler.
> address of vptr is equal to base address of object. So if the base address of object get’s corrupted, invoking virtual function with corrupted object results into application crash.

Virtual Table (VTable):
> it’s a memory space reserved by compiler to place entries of the functions which are declared as virtual. The entries of virtual functions in VTable is according to their declaration in class. Here bMethod3() is added at offset zero whereas bMethod4() is added at offset 1.
> virtual table is associated with a class, so there will always been one VTable irrespective of any number of objects created for that class and each object share the same VTable.
For example,
Base obj1, obj2, obj3;
The memory layout for obj1, obj2 & obj3 is:




Let’s declare one more class publicly derived from Base to look at the virtual function from inheritance point-of-view:


class Derived: public Base
{
public:
    void bMethod3()
    {
        cout<<”Derived::bMethod3()”;
    }
};


When a class derived from it’s parent class (contain virtual function), at first instance the compiler copies entries of Base::VTable into Derived::VTable,

VTable for Base Class
==================
Base::bMethod3()
Base::bMethod4()

VTable for Derived Class
====================
Base::bMethod3()
Base::bMethod4()

So, even if Derived class doesn’t override any virtual method still it contain VTable.

Now, the compiler look into overridden methods in Derived class and replace old entries with overridden method entries. In our case, as bMethod3() is overridden in Derived class it’s entry get replaced in VTable of Derived class.
Updated VTable of Derived Class
===========================
Derived::bMethod3()
Base::bMethod4()

Let’s write a code to invoke these methods,

Base *pBase;
Base bObj;
Derived dObj;

bObj.bMethod3() // invokes Base::bMethod3()

pBase = &bObj;
pBase->bMethod3(); // invokes Base::bMethod3()

pBase = &dObj;
pBase->bMethod3(); // invokes Derived::bMethod3()

Let’s look at the pseudocode generated by Compiler for some of the above statements,

bObj.bMethod3();
Assembly Code: Call bMethod3()
Description: Resolved at compile time and doesn’t depend on any run-time values.


pBase->bMethod3()
Assembly Code: Call (Get first 4 bytes of address stored in pBase (i.e. vptr)->Get content of vptr (i.e. address of VTable)->Get address of function[offset])
Description: Resolved at compile time with dependency on run-time value of address stored in pBase, the exact content of pBase is decided at run-time only (like pBase = &bObj & pBase = &dObj). So, which version of method is been invoked totally depends on the address stored inside pBase. Hence using pBase one can call different methods, that is nothing but our own OOPS concept Polymorphism – one item taking different forms.


Hope that you get good understanding about virtual functions. A suggestions, feedbacks are always welcome.

2 comments: