Asynchronous Multi-Thread Design Pattern with C++
Xiahua Liu October 19, 2024 #Concurrent Programming #C++This post does NOT discuss existing high-level async libraries or frameworks in C++. Instead, we explore the fundamental concepts behind the Asynchronous Multi-Thread Design Pattern and provide a raw implementation example.
Unlike Rust, C++ has no built-in language support for asynchronous channels. Consequently, developers new to C++ often struggle to plan their multithreaded architecture effectively.
The Asynchronous Design Pattern
In traditional synchronous programming, parent and child threads are usually interlocked via tight synchronization mechanisms, such as state machines. They often share direct access to the same memory for communication.
The problem with the synchronous pattern is the requirement for meticulous step-by-step synchronization. One mistake can lead to data races or deadlocks. Moreover, as software grows, the synchronization complexity increases exponentially. Because of these serious scalability issues, most modern high-performance systems have moved away from this design.
The asynchronous pattern decouples the execution. The parent thread does not interrupt the child thread; instead, they interact strictly through I/O structures. A typical workflow looks like this:
- Task Submission: The parent thread sends a task data structure to one (or a group of) child threads.
- Processing: A child thread unblocks upon receiving data, processes the task, and generates a result.
- Result Reporting: The child thread sends the result back to the parent and waits for the next task.
- Collection: The parent thread continues its own work. When it requires the result, it checks the incoming queue (blocking if necessary) to retrieve it.
You might wonder: Where are the synchronization steps?
In this pattern, synchronization is handled implicitly by the data structures (queues). Adding complex state management to child threads is generally considered bad practice as it re-introduces unnecessary complexity. Typically, parent and child threads communicate through FIFO (First-In-First-Out) queues, allowing data to be buffered while the reader is busy.
Example C++ Program
Let's look at a concrete example. The main thread wants to distribute math calculations to a group of worker threads (assuming a simple square() calculation).
First, we define a thread-safe FIFO queue:
```cpp
;
Note: Both
push()andpop()are protected by mutex_, making instances of this struct safe to share between parent and child threads.
Next, we define the worker thread function:
void
Finally, the main thread orchestrates the workers:
int
We share the same WorkerChannel_T<float> instance (the input channel) among multiple workers. Because the queue is locked during access, only one worker can pop() a specific task, ensuring tasks are distributed evenly without duplication.
You can find the full executable code on Godbolt.org.
Comparison to Async Frameworks
If you are familiar with standard async frameworks, you will notice similarities:
in_ch.push()followed bynotify()behaves similarly to creating a Promise or Future.out_ch.pop()behaves like an Await. The parent thread blocks on this function call, waiting for results to become available.
We effectively created a tiny async framework using only std::thread and standard library synchronization primitives! In production frameworks, the executioner is often a thread pool or coroutines rather than raw system threads, but the logic remains the same.
Scaling the Pattern
As shown above, sharing a FIFO channel among multiple workers is straightforward. Since the queue is mutex-protected, it is safe for all workers to block on one input channel.
However, complexity arises on the consumer side. In our example, the parent thread blocks on a single output channel. What if the parent needs to watch multiple channels simultaneously?
Strategy 1: Serialized Wait (Scatter-Gather)
The most straightforward approach is for the parent to wait for results one by one. While this may sound inefficient, it is actually quite powerful. The parent only waits as long as the slowest task takes to complete.
result1 = out1_ch.;
result2 = out2_ch.;
result3 = out3_ch.;
Once this sequence finishes, the parent is guaranteed to have all data ready for the next processing step. This is ideal when the parent needs a complete set of data before proceeding (often called a "Barrier").
Strategy 2: Wait for First Available
Sometimes, you want the parent thread to react immediately when any result is available, regardless of the order.
The simplest solution is to use a single shared output channel that supports multiple data types (using std::variant) or a common base structure. All child threads push to this single "Result Channel." The parent calls pop() on this single channel and uses std::visit to determine which worker replied and how to process the data.
Alternatively, you can implement a dedicated "Notification Channel." When a child thread finishes, it pushes its ID to the Notification Channel. The parent watches this channel to know which specific data channel is ready to be read.