Avoid virtual functions in C++ (when possible)
Xiahua Liu August 27, 2024 #C++Virtual functions are a core feature of C++, introduced at the very beginning to enable Run-time Polymorphism. This gives C++ a capability that C lacks natively: the ability to treat different types as a single abstract base type.
However, many developers overlook the "dark side" of this feature. It comes with hidden costs in both memory and performance.
The Cost of Runtime Polymorphism
Let's say you are writing a database application. You decide that every data type must have a print() function that returns a std::string.
The classic OOP approach is to define an abstract base class with a pure virtual print() function. Later, you implement this for specific types, like Int1.
(You can find the example code HERE)
Memory Overhead
The first issue is size. Even though Int1 only holds a single 4-byte int member variable, sizeof(Int1) is 16 bytes on a 64-bit machine.
In contrast, Int2 (which has no virtual functions) takes up only 4 bytes.
Where does this size difference come from?
If you check the assembly in the constructor of Int1, you'll see the program storing an additional value: vtable for Int1+16. This is the vptr (virtual pointer), which points to the vtable in memory.
mov edx, OFFSET FLAT:vtable for Int1+16
mov rax, QWORD PTR [rbp-8]
mov QWORD PTR [rax], rdx # Write 8 bytes (vtable address)
The Int2 constructor, however, simply writes the integer value and returns.
We need this pointer because the vtable pointer is unrelated to the object type during runtime. If you static_cast an Int1 object to BaseData, the object still points to Int1's vtable. This is how the program knows to call Int1::print() even when holding a BaseData pointer.
But if you are storing billions of these integers in a database, quadrupling your memory usage (4 bytes -> 16 bytes) is a disaster.
Performance Overhead
The second issue is speed. Calling a virtual function requires a "pointer chase":
- Follow the object's
vptrto thevtable. - Look up the correct function address in the table.
- Jump to that address.
This has several side effects:
- No Inlining: The compiler cannot inline the function because it doesn't know which function to call until runtime.
- Optimization Barriers: As seen in the Godbolt example, the compiler optimized away
Int2entirely because it saw the code was unused. It could not do the same forInt1because the virtual nature implies the function might be called externally. - Cache Misses: You are jumping to different memory locations (object -> vtable -> code), which hurts CPU cache performance.
Even using the final keyword—which suggests to the compiler that it can devirtualize—is not a guaranteed fix.
The Solution: Static Polymorphism (CRTP)
If we want the flexibility of an interface without the runtime cost, we can use the Curiously Recurring Template Pattern (CRTP).
Here is the optimized example code
In the CRTP example, Int3 inherits from BaseData<Int3>. The base class casts this to the derived type at compile time to call the implementation:
;
The assembly code for Int3 is now identical to Int2. It consumes only 4 bytes and involves no pointer chasing.
A Warning on Recursive Calls
You might notice that I named the implementation function user_print() instead of print().
In CRTP, it is best practice to distinguish the interface (in the base class) from the implementation (in the derived class).
If the derived class fails to implement user_print(), you get a clear compile-time error. However, if both functions were named print(), and the derived class forgot to implement it, the base class print() would essentially call itself (since static_cast<Derived*>(this)->print() would resolve back to BaseData::print() via inheritance). This causes infinite recursion, leading to a stack overflow/segmentation fault.
Generic Programming
CRTP is the foundation of generic programming in C++. Instead of relying on a common runtime type, we rely on a common interface.
We can write a function that accepts any type inheriting from BaseData:
std::string
Here, the compiler determines the type T at compile time. We get the safety and structure of an interface, but with zero runtime cost.