◆◆Intermediateasked at Amazonasked at Googleasked at Metaasked at Uber

The Top-K Elements Pattern

How a heap of size k gives you the k largest (or smallest, or most frequent) items in O(n log k) — and why a min-heap is the counterintuitive tool for finding the biggest things.

11 min read2026-01-20Ironclad Academy

#heaps #priority-queues #hash-maps #patterns #sorting

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

Why not just sort?

The obvious answer to "find the 10 highest-paid employees in a billion-row table" is ORDER BY salary DESC LIMIT 10. In SQL that's fine — the database has smarter tricks. But when you're writing the algorithm yourself, sorting is O(n log n) and you're paying for a fully sorted result you never asked for. You just wanted 10 things.

The heap approach costs O(n log k). With n = 10⁹ and k = 10, that's roughly 10⁹ × log(10) ≈ 3.3 × 10⁹ operations versus 10⁹ × log(10⁹) ≈ 30 × 10⁹ for sorting — an order of magnitude difference. And crucially, the heap uses O(k) space while sorting a billion elements in memory is simply not an option.

There's another reason beyond performance: streams. If your data arrives one element at a time — sensor readings, log lines, transactions — you can't sort what you haven't seen yet. The heap processes each element once, in order, and at any moment the heap contains the best k candidates seen so far.

The core pattern

Here's the skeleton. Everything in this article is a variation of this 15-line template.

import heapq

def top_k_largest(nums: list[int], k: int) -> list[int]:
    heap = []                          # min-heap, at most k elements
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)        # evict the weakest candidate
    return sorted(heap, reverse=True)  # heap has exactly k elements

The tell that a problem wants this pattern: some fixed count k of extremes, from data you iterate once. The template stays the same — what changes is how you define "better" (value, frequency, distance) and what you push onto the heap (raw value, tuple, negated value).

Worked problem 1: kth largest element

Problem: Find the kth largest element in an array. Not the kth distinct — just the kth in sorted descending order.

This is literally just the top-k pattern, and the answer is heap[0] at the end — the minimum of your k largest candidates, which is exactly rank k.

import heapq

def find_kth_largest(nums: list[int], k: int) -> int:
    heap = []
    for num in nums:
        heapq.heappush(heap, num)
        if len(heap) > k:
            heapq.heappop(heap)
    return heap[0]   # the root is the kth largest (smallest of the top-k)

# Example: nums = [3, 2, 1, 5, 6, 4], k = 2
# Top-2 largest: {5, 6} → heap root = 5 ✓

Complexity: O(n log k) time, O(k) space. The heap never holds more than k+1 elements — you push then immediately pop if over.

The quickselect alternative: For this specific problem (single element, not a stream, fits in memory), quickselect runs in O(n) average. Python's statistics.quantiles or a hand-rolled partition-based selection would beat the heap here in the average case. But the heap is simpler, handles streams, and has no worst-case trap. In an interview, mentioning quickselect as a follow-up shows depth — implement the heap first.

Worked problem 2: top-k frequent elements

Problem: Given a list of words (or numbers), return the k most frequent ones.

This is where the pattern composes with a hash table. You need two steps:

Count frequencies — one pass with a hash map, O(n).
Find top-k by frequency — heap of size k keyed by frequency, O(n log k).

import heapq
from collections import Counter

def top_k_frequent(words: list[str], k: int) -> list[str]:
    freq = Counter(words)              # O(n) — word → count

    # Min-heap of (count, word). We compare by count first.
    # Min-heap → smallest count at root → we evict the least frequent.
    heap = []
    for word, count in freq.items():
        heapq.heappush(heap, (count, word))
        if len(heap) > k:
            heapq.heappop(heap)        # drop the least frequent

    # Sort descending by count
    return [word for count, word in sorted(heap, reverse=True)]

One subtlety: if two words have the same frequency and the problem asks for lexicographic tiebreaking, Python's tuple comparison handles it automatically — (count, word) will compare word alphabetically when counts tie. For "return in reverse lex order," negate or swap: (count, [-ord(c) for c in word]) — ugly, but it works.

The clean Python shortcut:

heapq.nlargest(k, freq.keys(), key=freq.get)

Know this exists, but in an interview explain the manual approach. The nlargest function does the same heap-of-k internally.

Let's see the heap state mid-stream so the lesson sticks:

{ "type": "heap", "variant": "min", "title": "top-3 frequent: heap holds (count, word) pairs", "data": [2, 3, 5] }

The root (2) is the count of the least frequent word currently in our top-3. Next word with count 1? Turned away — smaller than the root. Word with count 4? In: pop 2, push 4, sift up.

Worked problem 3: k closest points to origin

Problem: Given a list of 2D points, return the k closest to the origin (0, 0).

Now "better" means smaller Euclidean distance. Push (distance², x, y) tuples — notice the min-heap here means we want the k smallest distances, so the pattern flips slightly.

Wait, does it flip? Think again. We still want a heap that evicts the worst candidate. The worst candidate among the k closest is the farthest one. So we need a max-heap on distance.

Since Python only has a min-heap, negate the distance:

import heapq

def k_closest(points: list[list[int]], k: int) -> list[list[int]]:
    heap = []  # max-heap by distance → store (-dist², x, y)
    for x, y in points:
        dist_sq = x * x + y * y        # skip sqrt — relative order is the same
        heapq.heappush(heap, (-dist_sq, x, y))
        if len(heap) > k:
            heapq.heappop(heap)        # evicts farthest point in our window

    return [[x, y] for _, x, y in heap]

Why dist_sq instead of the actual distance? Because sqrt is monotone — if a² < b² then a < b for non-negative values. Skipping sqrt saves a floating point operation per element and avoids precision issues.

Why negate? Python's min-heap pops the smallest value. Negating distance turns (-10, ...) into a smaller value than (-3, ...), so the farthest point (largest distance, most negative when negated) gets popped first. That's exactly the eviction behavior we want.

Worked problem 4: merge k sorted lists

This one feels different but it's the same heap pattern in a different costume. You have k sorted arrays and want to merge them into one sorted output.

The naive approach concatenates all arrays and sorts — O(N log N) where N is total elements. The heap approach is O(N log k) — and when k is small (like 100 sorted shards of a distributed sort), the savings are real.

import heapq

def merge_k_sorted(lists: list[list[int]]) -> list[int]:
    result = []
    # Heap entries: (value, list_index, element_index)
    heap = []

    # Seed: first element from each non-empty list
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))

    while heap:
        val, list_idx, elem_idx = heapq.heappop(heap)
        result.append(val)
        # Push this list's next element, if any
        next_idx = elem_idx + 1
        if next_idx < len(lists[list_idx]):
            heapq.heappush(heap, (lists[list_idx][next_idx], list_idx, next_idx))

    return result

The heap never holds more than k elements. Every element gets pushed and popped exactly once — O(log k) per operation, O(N log k) total.

This exact pattern powers merge phases in external sort algorithms, database query plan execution, and anywhere you're combining presorted shards. If you ever work on distributed data pipelines, you'll write some version of this.

flowchart TD
    subgraph "Three sorted lists"
        L1["[1, 4, 7]"]
        L2["[2, 5, 8]"]
        L3["[3, 6, 9]"]
    end
    H["min-heap\n{(1,L1), (2,L2), (3,L3)}"]
    OUT["output: 1 → 2 → 3 → 4 → ..."]
    L1 -->|"seed"| H
    L2 -->|"seed"| H
    L3 -->|"seed"| H
    H -->|"pop min, push successor"| OUT
    style H fill:#7c5cff,color:#fff
    style OUT fill:#00ff9d,color:#0a0a0f

The heap vs. quickselect trade-off

You'll hear about quickselect as the "optimal" solution for kth-largest. Let's be precise about the trade-off.

	Heap of size k	Quickselect
Time (average)	O(n log k)	O(n)
Time (worst case)	O(n log k)	O(n²)
Space	O(k)	O(1) — in-place
Handles streams	Yes	No
Gives sorted top-k	Yes (extra O(k log k))	No
Simplicity	Simple	Medium (partition logic)

Quickselect's O(n) average is real, but so is the O(n²) worst case. With naive pivot selection (e.g., always pick the last element), an adversary can construct an input — or you might just get unlucky — that degrades to quadratic. The median-of-medians algorithm guarantees O(n) worst case but with a large constant and real complexity to implement.

My recommendation: in an interview, implement the heap first and mention quickselect as a follow-up trade-off. In production code where k is large (k > n/10) and the data fits in memory, quickselect is worth considering — Python's heapq.nlargest actually switches to sorted() when k > n/2, which is essentially this reasoning made concrete.

For streaming data, the heap is the only option.

Complexity summary

Problem	Time	Space	Notes
Top-k largest / smallest	O(n log k)	O(k)	heap never exceeds k
Kth largest	O(n log k)	O(k)	answer is heap[0] at end
Top-k frequent	O(n log k)	O(n)	O(n) for the count map
K closest points	O(n log k)	O(k)	negate distance for max-heap
Merge k sorted lists	O(N log k)	O(k)	N = total elements across lists
Quickselect (kth)	O(n) avg / O(n²) worst	O(1)	in-place, no streams

Pitfalls and anti-patterns

Forgetting the min/max inversion. This is the most common mistake. To find the top-k largest, use a min-heap. To find the top-k smallest, use a max-heap. If you're about to use a max-heap for top-k-largest, stop and re-read the "waiting room" analogy. It always helps to ask: what am I evicting? The weakest candidate. The heap's root should be the weakest, so you can pop it cheaply.

Pushing all n elements then popping k. This is O(n log n) — the same as sorting, just uglier. The whole point is to keep the heap at size k. Push, check, pop-if-oversized — in that order, on every element.

Sorting the output when you don't need to. The heap gives you k elements in heap order, not sorted order. If the problem asks "return in any order," don't sort. If it asks for the k elements sorted, add one O(k log k) sort at the end — still dominated by the O(n log k) main loop.

Using a plain sort for "small enough" n. sorted(nums)[-k:] works and is fast in Python for small inputs. But the pattern matters more than the microoptimization in a learning context, and interviewers are checking whether you know the pattern. Show the heap approach first.

Mutating the count map mid-iteration. In the top-k frequent variant, build the Counter once, then iterate counter.items(). Don't update counts while iterating the heap phase — that's not what this pattern does.

When NOT to use this pattern

The top-k pattern has clear boundaries.

You need exact rank access. The heap gives you the k best, but heap[1] is not the second best — the heap only guarantees the root is the minimum (or maximum). If you need to query "give me the 7th element" cheaply after the fact, you want a sorted array.

k is close to n. If you're finding the top 90% of elements, you might as well sort. The heap's advantage shrinks as k approaches n. Python's heapq.nlargest uses sorted() when k > n/2 for exactly this reason.

You need to update priorities dynamically. Once an element is inside the heap, you can't cheaply update its key. If your use case requires changing an element's priority after insertion (e.g., Dijkstra's algorithm adjusting path costs), the standard workaround is lazy deletion — mark stale entries and skip them on pop. See the heap article for the lazy-deletion pattern.

You need the result in sorted order frequently. Top-k gives you k elements in heap order. If your downstream code needs them sorted every time, and you're calling this many times, a different data structure (sorted list with bisect, or an order-statistics tree) might be cleaner.

The problem is really a divide-and-conquer. Some "top-k" questions are better framed as partitioning — see divide and conquer for when splitting the problem space beats scanning with a heap.

The top-k pattern is one of those clean ideas that appears in disguise constantly — leaderboards, recommendation systems, log analysis, sensor stream aggregation. Once you see "I need the k best of something from a lot of data," the min-heap-of-size-k template should fire immediately. The one thing to tattoo on your brain: min-heap for top-k largest. It's backward on purpose. The root is your eviction candidate — always keep your worst current candidate cheap to access.

For a full picture of the heap mechanics underpinning this pattern, read the heap article. For counting frequencies before ranking, the hash table covers the Counter internals.

// KEY TAKEAWAYS

▸A min-heap of size k finds the k largest elements in O(n log k) — counterintuitive but correct, because the min-heap lets you cheaply evict the weakest candidate.
▸Top-k frequent requires a hash map for counts followed by the standard heap pattern — the hash map and heap compose cleanly.
▸Quickselect is O(n) average but O(n²) worst case and requires all data in memory; the heap handles streams and guarantees O(n log k).
▸Merge k sorted lists with a min-heap of size k: seed with one element from each list, always pop the smallest and push its successor — O(N log k) where N is total elements.
▸The tell for this pattern: the problem asks for some fixed count k of extremes (largest, smallest, closest, most frequent) from a collection you can only iterate once or that arrives as a stream.

// FAQ

Frequently asked questions

▸Why do you use a min-heap to find the k largest elements?

It sounds backwards until you think about what the heap is doing. You're maintaining a window of the k best candidates seen so far. The min-heap lets you peek at the weakest member of that window in O(1) — so for each new element you can instantly ask 'is this better than my worst current candidate?' If yes, evict the weakest (heap pop) and add the new one (heap push). A max-heap would give you the best candidate instantly, but you'd have no cheap way to find and evict the worst one. The min-heap of size k is a bouncer who always kicks out the weakest link.

▸What is the time complexity of the top-k pattern?

O(n log k) time, O(k) space. For each of the n elements you do at most one push and one pop, each costing O(log k) because the heap never grows past size k. When k is much smaller than n — say top-10 in a billion-element stream — this is dramatically better than O(n log n) sorting. When k equals n you're effectively sorting, so the two approaches converge.

▸When should I use quickselect instead of a heap for top-k?

Quickselect runs in O(n) average time, which beats the heap's O(n log k) when k is large relative to n. But quickselect has two big catches: it's O(n²) in the worst case unless you use median-of-medians (which is complex), and it requires all elements in memory at once. Use a heap when the data is a stream (you can't go back), k is small, or you need the actual top-k sorted. Use quickselect when the data fits in memory, you only need one element (the kth), and average-case performance is acceptable.

▸What is the difference between the kth largest and top-k largest problems?

They're related but distinct. 'Top k largest' means return all k elements; 'kth largest' means return exactly the element at rank k. Both can be solved with a min-heap of size k — after processing all elements, the top-k is everything left in the heap, and the kth largest is the heap root (the minimum of those k elements, which is exactly rank k). The heap approach is the same for both.

▸How do I handle top-k by frequency instead of value?

Two-step: first count frequencies with a hash map in O(n). Then run the top-k pattern on (frequency, element) pairs using a min-heap of size k keyed by frequency. Total time is O(n log k). In Python you can shortcut this with heapq.nlargest(k, counter.keys(), key=counter.get) but it's worth knowing the manual approach since that's what an interview expects you to explain.

← previous

Topological Sort

Subsets, Permutations & Combinations

// RELATED