Revisit Sorting Algorithms in C++
Xiahua Liu November 19, 2024 #C++ #AlgorithmAll kinds of sorting algorithms, re-visited!
Easiest
Bubble Sort
Bubble sort is the easiest sorting algorithm; it can be considered an inferior version of insertion sort.
You should never use bubble sort in real applications. It has no advantages compared to the other sorting algorithms.
The basic concept: traverse the vector and compare [j] and [j-1]; if they are out of order, swap them and continue. Each pass pushes the largest remaining element to the end, so we can shrink the inner loop range as we go.
void
Bucket Sort
Bucket sort is also simple and runs in O(N) time when the value range is bounded and small.
It is optimal when the sorted value deviation is small (and non-negative in this example).
void
Easy But Not Efficient
Insertion Sort
Insertion sort works well when the input is already nearly sorted and is stable.
It is very similar to bubble sort, except each traversal bubbles values from back to front.
void
It works better than bubble sort although they have the same worst time complexity O(N^2). The heap sort in the next category can be considered as the superior version of it.
Shell Sort
Shell sort is a gap-based variant of insertion sort. The gap sequence matters; this example uses a short, fixed sequence for simplicity.
There are things like binary insertion sort, which is conceptually similar but still quadratic in the worst case.
void
Efficient
Merge Sort
Divide and conquer sort; think recursively:
- Break a vector into 2 vectors (same length).
- Sort the 2 vectors independently so they are in order.
- If one of the 2 vector is null, return the other vector directly.
- Merge back the 2 pieces with a 2-pointer merge.
A common optimization is to pre-allocate a buffer vector to hold merged results and avoid repeated allocations.
// buffer is reused to avoid allocations while merging [start, end)
void
Call it like this to ensure the buffer is pre-sized once:
vector<int> buffer;
buffer.;
;
You can avoid clearing and copying by alternately swapping the roles of the source and destination buffers during recursion.
// src is the current read buffer; dst is the write buffer
void
Before calling the optimized version, ensure dst starts as a copy of src and call merge_sort(dst, src, 0, src.size()).
Heap Sort
Heap sort is very efficient for repeatedly getting min or max. It underpins the priority_queue data structure.
Compared to merge sort and quick sort, it has several advantages:
- Requires no extra space, which makes it suitable for large data structures.
- Good for inserting new values at runtime, as well as popping the min/max value.
- An upgraded version of the insertion sort, works well with nearly sorted data.
However it is slower than merge sort in most cases for pure sorting. If all we want is sort an array, merge sort often wins.
After the vector is made into a heap in a certain order (which takes O(N)), reading the min/max value takes O(1).
If the min/max element is popped or overwritten, the maintenance only takes O(log N) time.
Heap sort can be described as first making a vector into a heap, then popping the first element one by one. The pop is done by swapping head and tail elements, reducing the vector size by 1, then sinking the new top element.
Heap itself can be seen as a complete binary tree data structure. The time complexity for heap sort is very stable.
make_heap()
We can define a helper function called sink(). It sinks a vector node to the correct location on the heap.
Then we sink every value from the vector end to the beginning, creating the minimum heap structure.
void
void
pop_heap()
pop_heap() will move the first element to the end of the vector, then do a heap update so that the heap is still valid.
This process can be taken as:
- Swap the begin and end element, and reduce the size of the
vecby 1.- You can either directly remove the popped element or only move the popped item to the end.
- The default C++
pop_heap()will only move the popped item to the end.
- Sink the new top element so the heap is still valid.
- If you only moved at the first step, you need to remember the last element location.
Because we need to swap at first, we need an empty() check.
void
insert_heap()
insert_heap is like reverse of pop_heap(), we append the element to the end of the vector, then bubble up the end element.
Because the bubble-up process doesn't break the heap validity, we don't need to sink afterwards.
For child index at i we can use (i-1)/2 to get the parent index.
void
Quick Sort
Quick sort can be considered as upgraded version of merge sort, although its worst time complexity is O(N^2). It is also based on the divide and conquer concept.
However on average it performs better than the merge sort. And the space complexity is O(N) for the worst case.
The following code uses Hoare partition algorithm, which can be considered as 2 pointer method, and we ping-pong the pivot element to one of the pointer and increment the other pointer, until these 2 pointers across.
Notice it only works when we pick the first element as the pivot element. If random pivot is given, we need to swap the random pivot element and the first element before using this algorithm.
int
void