Sorting Algorithms, From Scratch
Build real intuition for sorting — from the O(n²) trio you can explain in your sleep to the O(n log n) workhorses that run in every standard library. Covers bubble, selection, insertion, merge, and quicksort with the concepts that actually matter.
The problem, stated plainly
You have n items. You want them in order. How you do it determines whether sorting 1 million items takes ~20 million operations or ~1 billion. That's the difference between "fast" and "you went home before it finished."
There are five algorithms worth learning here. Two because they're beautiful to think about. One because it's secretly good in a situation everyone hits. Two because they're what actually runs in every standard library on earth.
The O(n²) trio
All three of these share the same root cause of slowness: they resolve order one adjacent comparison at a time, without ever using what they've already learned to skip work. Let's be specific about each.
Bubble sort — the one everyone learns first and no one should use
Walk left to right. Whenever two adjacent elements are out of order, swap them. Repeat until no swaps happen in a full pass.
def bubble_sort(arr):
n = len(arr)
for i in range(n):
swapped = False
for j in range(0, n - i - 1): # last i elements are already in place
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
if not swapped: # early exit if already sorted
break
After pass 1, the largest element has "bubbled" to the end. After pass 2, the second-largest has too. Each pass needs one fewer comparison because the right end is done. But in the worst case you still do n-1 + n-2 + ... + 1 = n(n-1)/2 comparisons, which is O(n²).
The early-exit is the one redeeming feature. On an already-sorted array, you do exactly n-1 comparisons and stop. That's O(n). In practice, though, you'll never have a sorted array and not know it — so this doesn't buy much.
Verdict: educational, not usable. Use insertion sort if you're in the O(n²) camp for any reason.
Selection sort — the one that minimizes swaps
Find the minimum in the unsorted region, swap it to the front, repeat.
def selection_sort(arr):
n = len(arr)
for i in range(n):
min_idx = i
for j in range(i + 1, n):
if arr[j] < arr[min_idx]:
min_idx = j
arr[i], arr[min_idx] = arr[min_idx], arr[i] # exactly one swap per pass
Always O(n²) comparisons, no matter the input — there's no early exit possible because you always have to scan the whole unsorted region to find the minimum. But it does exactly n-1 swaps, which matters if swapping is expensive (writing to a slow medium, for example). Basically a niche algorithm for a niche constraint.
Insertion sort — the one that's secretly fast
Take the next unsorted element and insert it into the correct position in the sorted region to its left, shifting elements right to make room.
def insertion_sort(arr):
for i in range(1, len(arr)):
key = arr[i]
j = i - 1
while j >= 0 and arr[j] > key: # shift right until we find where key belongs
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
Think of sorting a hand of playing cards. You pick up one card at a time and slide it left until it's in the right spot. That's insertion sort, exactly.
The secret: on nearly-sorted input, the inner while loop almost never runs. If every element is within k positions of its final place, the total work is O(nk), which collapses to O(n) when k is small. That's not a trivial case — sensor data, log files with minor timestamp skew, or data that's already 95% sorted all hit this. Merge sort and quicksort don't exploit it.
This is why Python's Timsort — the sort that runs every time you call sorted() — uses insertion sort internally on runs of ~64 elements. The constant-factor overhead of recursion and buffer management makes merge sort slower than insertion sort on small n, typically below about 16–20 elements.
The jump to O(n log n)
Here's the core insight: if comparison-based sorting is going to do better than O(n²), each comparison needs to eliminate more than one possibility. The O(n²) algorithms learn "these two elements are in order" and move on — one pair resolved per comparison. The O(n log n) algorithms use divide and conquer: split the input in half, sort each half recursively, combine. Every element now participates in only O(log n) merge or partition steps instead of O(n) passes. That's the theoretical escape hatch — and divide and conquer is a technique worth understanding on its own terms.
See this play out visually:
{ "type": "sorting", "title": "merge sort vs. insertion sort", "data": [8, 3, 6, 1, 9, 2, 7, 5] }
Step through insertion sort first. Count how many times an element gets touched. Then restart with merge sort. The elements get moved fewer times in total because each merge step does useful work on a larger region.
Merge sort — the clean guarantee
Split the array in half. Sort each half recursively. Merge the two sorted halves into one sorted array.
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]: # <= not < ensures stability
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
The merge step is the clever part. You have two sorted sequences. Walk through both with a pointer each, always picking the smaller front element. Two sorted sequences of total length n merge in O(n) — one pass, no comparisons wasted. The recursion tree has O(log n) levels. Each level does O(n) total work (all the merges at that depth). Total: O(n log n), guaranteed, for every input.
flowchart TD
A["[8, 3, 6, 1, 9, 2, 7, 5]"]
B["[8, 3, 6, 1]"]
C["[9, 2, 7, 5]"]
D["[8, 3]"]
E["[6, 1]"]
F["[9, 2]"]
G["[7, 5]"]
H["[3, 8]"]
I["[1, 6]"]
J["[2, 9]"]
K["[5, 7]"]
L["[1, 3, 6, 8]"]
M["[2, 5, 7, 9]"]
N["[1, 2, 3, 5, 6, 7, 8, 9]"]
A --> B & C
B --> D & E
C --> F & G
D --> H
E --> I
F --> J
G --> K
H & I --> L
J & K --> M
L & M --> N
style N fill:#00ff9d,color:#0a0a0f
style A fill:#7c5cff,color:#fff
The catch is memory: merging requires scratch space. The naive implementation above allocates O(n) extra. In-place merging algorithms exist but they're complex and have bad constants in practice. Merge sort is the go-to when you need a stable, worst-case O(n log n) sort and can afford the extra memory — sorting linked lists, external sorting (data too big for RAM), or anywhere the stable guarantee matters.
Stability means equal elements come out in the same order they went in. The <= in the merge comparison is what creates it: left-side elements win ties, so earlier elements stay earlier. Change it to < and you break stability.
Quicksort — the one that actually wins on real hardware
Pick a pivot. Partition: elements smaller than pivot go left, larger go right. Recurse on each side.
def quicksort(arr, lo=0, hi=None):
if hi is None:
hi = len(arr) - 1
if lo >= hi:
return
p = partition(arr, lo, hi)
quicksort(arr, lo, p - 1)
quicksort(arr, p + 1, hi)
def partition(arr, lo, hi):
# randomize pivot to avoid O(n²) on sorted input
import random
pivot_idx = random.randint(lo, hi)
arr[pivot_idx], arr[hi] = arr[hi], arr[pivot_idx]
pivot = arr[hi]
i = lo - 1
for j in range(lo, hi):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
arr[i + 1], arr[hi] = arr[hi], arr[i + 1]
return i + 1
After one partition step, the pivot is in its final sorted position — forever. That's the insight. Every partition puts at least one element in its correct place and divides the remaining work.
Average case: a random pivot splits the array roughly in half each time. O(log n) levels of recursion, O(n) work per level — same logic as merge sort. O(n log n) average.
Worst case: if you always pick the smallest or largest element as pivot — which happens if you naively pick the first element on an already-sorted input — each partition peels off exactly one element. n + (n-1) + (n-2) + ... = O(n²). This is why the random pivot in the code above isn't optional.
Why quicksort beats merge sort on real hardware despite having a worse worst-case? Two reasons:
- In-place. No O(n) buffer. All swaps happen in the original array.
- Cache behavior. Partitioning walks the array linearly. Merging reads from two separate memory regions. On modern hardware where a cache miss costs ~100× a cache hit, that difference is huge. On random data, quicksort's cache behavior is substantially better.
The practical gap is real: on arrays of 1 million random integers, a tuned quicksort is typically 2–3× faster than merge sort in wall-clock time, even though the Big-O is identical. Big-O doesn't capture constant factors, and this is a case where they dominate.
Two concepts that will come up in every interview
Stability
A sort is stable if equal elements preserve their original relative order. Imagine sorting this list of employees by department:
(Alice, Engineering), (Bob, Marketing), (Carol, Engineering)
A stable sort by department gives: (Alice, Engineering), (Carol, Engineering), (Bob, Marketing). Alice still comes before Carol — same relative order as the input. An unstable sort might flip them.
This only matters when you sort objects with multiple fields, or when you chain sorts. But when it matters, it really matters — violating stability creates bugs that are almost impossible to track down.
- Stable: bubble sort, insertion sort, merge sort (with the
<=in merge) - Not stable: selection sort, classic quicksort, heapsort
In-place
A sort is in-place if it uses O(1) extra memory (or O(log n) if you count the recursion stack). The only reason to care is memory. On 10 million 64-bit integers, merge sort's O(n) buffer costs 80 MB you might not have. Quicksort needs nothing but the call stack.
| Algorithm | In-place | Stable |
|---|---|---|
| Bubble sort | Yes | Yes |
| Selection sort | Yes | No |
| Insertion sort | Yes | Yes |
| Merge sort | No | Yes |
| Quicksort | Yes | No |
No algorithm in this list is both in-place and stable with O(n log n) performance. That's not an accident — it's a known theoretical tension. Timsort gets very close by being stable and using O(n) space in the worst case but O(1) on already-sorted data.
The part you will actually use
Here's the honest version of this lesson: you will almost never implement a sort yourself. Every language ships with a production-tuned sort that's better than anything you'd write in an interview.
- Python:
sorted()andlist.sort()use Timsort — merge sort + insertion sort hybrid, stable, O(n log n). - Java:
Arrays.sort()on primitives uses dual-pivot quicksort; on objects, Timsort. - C++:
std::sortuses introsort — quicksort with a heapsort fallback to guarantee O(n log n) worst case. - JavaScript: V8's
Array.prototype.sortuses Timsort since Node 11.
The question "which sort do you use?" has a one-word answer in practice: sort(). The follow-up that reveals whether you actually understand sorting: "when would the built-in sort not be fast enough?"
Answer: when you have a constraint the generic sort ignores. Maybe you need O(1) extra space (merge sort's buffer is a problem). Maybe you know your data is nearly sorted (insertion sort would be O(n) here; Timsort already exploits this, but a hand-rolled solution might do even better). Maybe you need to sort by a key that's expensive to compute and you want to minimize key evaluations (a Schwartzian transform, not a different sort algorithm).
So why learn all of this? Because when sorting shows up in an algorithm problem — and it does, constantly — you need to know whether a sort dominates your complexity or is a cheap preprocessing step. Sorting n items then doing O(log n) lookups is O(n log n) total. Sorting inside a loop is O(n² log n), which is a catastrophe. You can only reason about this if you know what sorting costs. See searching for exactly this pattern: sort once, binary search many times.
What to internalize
Not the code. The shape of the idea.
- O(n²) algorithms are local and myopic. They compare nearby elements and forget what they learned. They're simple, in-place, and fast on tiny n.
- O(n log n) algorithms are global and recursive. They split the problem, solve the halves, and combine — each combination step does useful work on a large chunk of data.
- Insertion sort has a superpower: it's O(n) on nearly-sorted input. That's why it lives inside Timsort.
- Merge sort has a superpower: it's stable and guaranteed O(n log n). That's why it handles objects in Java and Python.
- Quicksort has a superpower: it's fast in practice because it's in-place and cache-friendly. Randomize the pivot or you will regret it.
The comparison-based sorting lower bound is Ω(n log n) — you cannot do better without knowing something about the values (like that they're integers in a bounded range, which unlocks counting sort and radix sort). Everything in this article is hitting right up against that limit. That's a useful sanity check: when someone claims their comparison sort is O(n), they're wrong.
Next up: searching — which starts with the question "what does being sorted buy you?"
Frequently asked questions
▸What is the difference between a stable and unstable sorting algorithm?
A stable sort preserves the original relative order of equal elements. If two records both have the key "42", they come out in the same order they went in. Merge sort is stable; classic quicksort is not. Stability matters when you sort by multiple criteria — sort by name first, then by age, and you need the first sort to survive the second.
▸What does in-place mean for a sorting algorithm?
An in-place sort rearranges the elements without allocating a second array — it uses O(1) or O(log n) extra memory. Bubble, selection, insertion, and heapsort are in-place. Merge sort is not: it needs a temporary buffer of size n to merge the two halves. Quicksort is in-place (the call stack counts as O(log n) extra space in the average case).
▸Why is insertion sort faster than merge sort on small arrays?
Big-O hides the constant factor. Merge sort on 8 elements does recursion setup, boundary checks, and memory writes to a scratch buffer. Insertion sort on 8 elements is a tight inner loop with excellent cache behavior. The crossover point is usually around 10–16 elements, which is why many production sort implementations (like Python's Timsort) fall back to insertion sort for small subarrays.
▸What is the worst-case scenario for quicksort and how do you avoid it?
If you always pick the smallest or largest element as the pivot — which happens on an already-sorted input when you use the first or last element — you get O(n²) because each partition peels off just one element. The fix is randomized pivot selection (pick a random index) or the median-of-three heuristic (take the median of the first, middle, and last elements). Production quicksort implementations always randomize.
▸Which sorting algorithm should I use in practice?
You should almost never choose one yourself — just call the sort function in your language. Python uses Timsort (merge sort + insertion), Java uses dual-pivot quicksort for primitives and Timsort for objects, and C++ uses introsort (quicksort that falls back to heapsort). These are battle-tested and tuned. The only time you pick your own is under an unusual constraint: stable sort required, in-place required with O(1) extra space, or a nearly-sorted input you want to exploit.