Avoid virtual functions in C++.

Xiahua Liu August 27, 2024 #Design Patterns #C++

Virtual function is the core feature of C++, it was introduced to C++ from the very beginning. It gives C++ something that C does not have, which is known as Run-time Polymorphism. However, many people often ignore the dark side of it.

Example

Let's say you are going to write a database application. For each data type inside the database, it must have a print() function associated, which returns a std::string type, so that people can see the data.

We can define a base abstract class, then declare print() as a pure virtual function there, so we can redefine this function later on for our data types, for example Int1.

Example code can be found HERE:

However, even Int1 only has 1 int member variable, storing it takes 16 bytes of space!

Int2 which does not have any virtual functions at all, only takes 4 bytes to store. Imagine if we are going to store billions of Int1 in our database, Int1 will definitely become a poor choice.

Where does this size difference come from?

If you read the assembly code on godbolt, inside the Int1 constructor, the program needs to store an additional value vtable for Int1+16, which points to the vtable for Int1 on memory.

Int1::Int1(int) [base object constructor]:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     DWORD PTR [rbp-12], esi
        mov     rax, QWORD PTR [rbp-8]
        mov     rdi, rax
        call    BaseData::BaseData() [base object constructor]
        mov     edx, OFFSET FLAT:vtable for Int1+16
        mov     rax, QWORD PTR [rbp-8]
        mov     QWORD PTR [rax], rdx    # Write 8 bytes (vtable address)
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rbp-12]
        mov     DWORD PTR [rax+8], edx  # Write 4 bytes (input int value)
        nop
        leave
        ret

However this does not exist for Int2, the constructor of Int2 only writes back 4 bytes:

Int2::Int2(int) [base object constructor]:
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     DWORD PTR [rbp-12], esi
        mov     rax, QWORD PTR [rbp-8]
        mov     edx, DWORD PTR [rbp-12]
        mov     DWORD PTR [rax], edx  # Wrtie 4 bytes (input int value)
        nop
        pop     rbp
        ret

The reason for the extra 8 bytes are that, for a Int1 object to call its virtual functions, the program needs to find the vtable belongs to this Int1 first.

Why we need to store this value? Can't we just hardcode it in the Int1::print() function?

Imagine if we static_cast our Int1 object back to the base class BaseData, even though the cast object is BaseData type, the vtable pointer is still same as the Int1 type. In short:

The vtable pointer of an object is unrelated to the object type during runtime.

This gives us Run-time Polymorphism, but also pain.

Performance issues

This vtable jump hits our software performance, such as:

We cannot inline the function anymore. The program has to jump to the vtable first, then jump back to the function definition next.
For compiler, this 2-step jump limits the compiler's optimization capability. It can be seens in our example, because Int2::print() is never called, the compiler didn't even create this function at all. However compiler cannot do the same for Int1::print() because it can not predict when and where this function will be called.
There is memory caching issues as you can imagine, we are reading different memory locations back and forth.

What about `final`?

Even though you can use final keyword to let the compiler do the devirtualization optimization when possible, this optimization is not reliable.

The Run-time Polymorphism virtual functions provides, is a double edge sword, it gives us flexiblity, but also limits the compiler's optimization capability.

How to NOT use virtual function?

We can use a design pattern called Curiously Recurring Template Pattern (CRTP).

Here is the example code

The assembly code for Int3 and Int2 is identical, but since Int3 inherits BaseData, user now must implement user_print() for Int3 or will get a compilation error.

The Int3 implementation is almost identical with the virtual function version but it does not have any performance overhead.

Little heads up

You may have noticed that Int3 has a different user_print() function name than the print() in BaseData<Int3>.

This is because in CRTP, you MUST differ the function names (here in our example base_print() and print()) that are used in the base class and the derived class.

Otherwise the program will run into segment fault issue, since the compiler now links these 2 functions in a circle.

Generic programming vs virtual functions

The CRTP style of programming is acutally the fundation of the generic programming style in C++, where we consider more generic types than a specific one in our software.

For example we want to have a print_object() function that calls the corresponding print() function on an object that inherits the BaseData interface. We can write:

template<class T>
std::string print_object(const BaseData<T>& a) {
  return a.print();
}

Writting this way, we ask the compiler to determine whether T inherits BaseData<T> during the compile time.

Note BaseData<T> in this function template now becomes a generic type, all types that inherits BaseData<T> can be accepted here instead of a specific type one.

In generic programming, we care more about the interface of an object, rather than the type of it, or its private content.

The generic interface of a type is provided by its parenet classes (one can have more than one interfaces of course), at zero runtime cost.