Topological Sort
Order the nodes of a DAG so every edge points forward — the algorithm that drives build systems, package managers, course schedulers, and anything else that lives and dies by dependency ordering.
The mental model: dependency chains as arrows
The framing that makes everything click: draw an arrow from every prerequisite to whatever depends on it. Course A → Course B means "you can't take B without A." Library X → Library Y means "Y must be installed before X can build." Task Open_DB → Task Query_DB means "query can't start before the connection is open."
Once you've drawn those arrows, you have a directed graph. If you can topologically sort it, you have your execution order. If you can't — because the graph has a cycle — you've also found your bug, because someone wrote a circular dependency.
The key constraint is that the graph must be a DAG — directed and acyclic. "Directed" just means the edges have direction (A depends on B is not the same as B depends on A). "Acyclic" means no path leads back to where it started. The moment you get a cycle, the whole ordering breaks down.
flowchart LR
A["course: Math"] --> C["course: Linear Algebra"]
B["course: Stats"] --> C
C --> D["course: ML"]
C --> E["course: Data Science"]
B --> E
style A fill:#7c5cff,color:#fff
style B fill:#7c5cff,color:#fff
style C fill:#22d3ee,color:#0a0a0f
style D fill:#00ff9d,color:#0a0a0f
style E fill:#00ff9d,color:#0a0a0f
Two valid orderings here: [Math, Stats, Linear Algebra, ML, Data Science] and [Stats, Math, Linear Algebra, Data Science, ML], among others. Math and Stats have no dependency on each other, so their relative order is unconstrained.
Kahn's algorithm: peeling the onion
Kahn's is intuitive if you think about it from first principles. A node with in-degree zero has no prerequisites — it can go first, immediately. Remove it from the graph. Now some other nodes might have lost their last prerequisite, pushing their in-degree to zero. Peel those next. Repeat until nothing is left. The order in which you peeled the nodes is a valid topological ordering.
Here's the thing that makes this beautiful: if at any point you're stuck — there are still nodes in the graph but none have in-degree zero — those nodes are all part of a cycle. That's your free cycle detector.
from collections import deque, defaultdict
def topological_sort_kahn(n, edges):
"""
n: number of nodes (0 to n-1)
edges: list of (u, v) meaning u must come before v
Returns: topological ordering, or [] if cycle detected
"""
adj = defaultdict(list)
in_degree = [0] * n
for u, v in edges:
adj[u].append(v)
in_degree[v] += 1
# seed the queue with all zero-in-degree nodes
queue = deque(i for i in range(n) if in_degree[i] == 0)
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in adj[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
# if we didn't process every node, a cycle exists
return order if len(order) == n else []
Walk through it step by step on the course graph above:
- In-degrees: Math=0, Stats=0, Linear Algebra=2, ML=1, Data Science=2.
- Queue starts with [Math, Stats].
- Dequeue Math → append to order → decrement Linear Algebra's in-degree to 1.
- Dequeue Stats → append to order → decrement Linear Algebra to 0, Data Science to 1 → enqueue Linear Algebra.
- Dequeue Linear Algebra → decrement ML to 0, Data Science to 0 → enqueue both.
- Dequeue ML, then Data Science.
- Order: [Math, Stats, Linear Algebra, ML, Data Science]. All 5 nodes → no cycle.
The time cost: you touch every node once when you enqueue it, and every edge once when you decrement in-degrees. That's O(V + E), and you can't do better since you have to look at every edge at least once.
DFS post-order: the elegant alternative
The DFS approach flips the logic. Instead of asking "what has no prerequisites?", it asks "what has no dependents?" — and works backward. You run DFS and push each node onto a stack only after all nodes it can reach have already been pushed. When you're done, pop the stack left to right and you have the topological order.
Why does post-order give you the right answer? Because in a DFS, you finish a node only after fully exploring everything downstream from it. So if A → B, you finish B before you finish A — meaning B gets pushed to the stack first, A gets pushed second. Pop them off and A comes out before B. Exactly the ordering you wanted.
def topological_sort_dfs(n, edges):
adj = defaultdict(list)
for u, v in edges:
adj[u].append(v)
# 0 = unvisited, 1 = in-progress, 2 = done
state = [0] * n
result = []
has_cycle = [False]
def dfs(node):
if has_cycle[0]:
return
state[node] = 1 # mark in-progress
for neighbor in adj[node]:
if state[neighbor] == 1: # back edge → cycle
has_cycle[0] = True
return
if state[neighbor] == 0:
dfs(neighbor)
state[node] = 2 # mark done
result.append(node) # post-order: push after subtree is done
for i in range(n):
if state[i] == 0:
dfs(i)
if has_cycle[0]:
return []
return result[::-1] # reverse for correct topological order
The three-state trick (unvisited / in-progress / done) is what gives you cycle detection. If DFS ever steps onto a node that's in-progress — already on the current call stack — you have a back edge, which means a cycle. A node marked done is safe to re-encounter; you've already fully processed it.
Which algorithm should you use in an interview? Kahn's is more self-documenting — the examiner can follow the BFS loop without knowing the DFS post-order insight. DFS post-order is slightly cleaner if you're already maintaining a recursive DFS structure for other reasons. Both are O(V + E). Pick one, commit to it, and know it cold.
Core problems: same shape, different costumes
Course Schedule I — does a valid ordering exist?
You have
numCoursescourses and a list of prerequisites[a, b]meaning "you must take b before a." Return true if you can finish all courses.
This is "does the directed graph have a cycle?" — nothing more. Run Kahn's; if the output length equals numCourses, return True.
def can_finish(numCourses, prerequisites):
adj = defaultdict(list)
in_degree = [0] * numCourses
for a, b in prerequisites: # b must precede a
adj[b].append(a)
in_degree[a] += 1
queue = deque(i for i in range(numCourses) if in_degree[i] == 0)
count = 0
while queue:
node = queue.popleft()
count += 1
for neighbor in adj[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return count == numCourses
Time: O(V + E). Space: O(V + E). Clean, no tricks.
Course Schedule II — return the actual ordering
Same graph, now return the order itself (or [] if impossible). Just collect the nodes as Kahn's processes them.
def find_order(numCourses, prerequisites):
adj = defaultdict(list)
in_degree = [0] * numCourses
for a, b in prerequisites:
adj[b].append(a)
in_degree[a] += 1
queue = deque(i for i in range(numCourses) if in_degree[i] == 0)
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in adj[node]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return order if len(order) == numCourses else []
I'll keep repeating this because it's worth hammering home: the only difference between "detect cycle" and "return ordering" is whether you return a boolean or the accumulated list. The graph construction is identical.
Alien Dictionary — the tricky build-order cousin
This one trips people up because the graph isn't handed to you — you have to construct it. You're given a list of words in an alien language sorted lexicographically. Infer the ordering of letters.
The key insight: compare adjacent words character by character. The first position where they differ tells you one fact: that letter comes before the other in the alien alphabet. Once you have all those ordering constraints, the rest is Kahn's.
def alien_order(words):
# initialize: every character that appears is a node
adj = {c: [] for word in words for c in word}
in_degree = {c: 0 for word in words for c in word}
# extract ordering constraints from adjacent word pairs
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
# edge case: ["abc", "ab"] is invalid (prefix appears after full word)
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ""
for j in range(min_len):
if w1[j] != w2[j]:
adj[w1[j]].append(w2[j])
in_degree[w2[j]] += 1
break # only the first difference matters
# standard Kahn's
queue = deque(c for c in in_degree if in_degree[c] == 0)
result = []
while queue:
c = queue.popleft()
result.append(c)
for neighbor in adj[c]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return "".join(result) if len(result) == len(in_degree) else ""
The visualizer below shows the graph you'd build for ["wrt", "wrf", "er", "ett", "rftt"]. The edges encode the partial character ordering the word list implies.
{ "type": "graph", "title": "alien dictionary: character ordering constraints as a DAG" }
After Kahn's on that graph: t → w → e → r → f. That's one valid alien alphabet.
The mistake people make: they try to process all characters at each comparison position instead of stopping at the first difference. Subsequent characters tell you nothing — you can only establish one ordering constraint per adjacent word pair.
Build Order — generalized dependency scheduling
Real build systems aren't just topological sort; they fan out parallelism as much as possible. The classic interview version: given a list of projects and a list of dependencies (a, b) meaning "b depends on a," find a valid build order.
This is the canonical Kahn's setup with a meaningful node label. The one wrinkle interviewers add: projects that have no dependencies on anything can be built first, in any order. Kahn's gives you this for free — all zero-in-degree nodes land in the queue together, and you can build them in parallel.
def find_build_order(projects, dependencies):
"""
projects: list of project names
dependencies: list of (a, b) tuples meaning "a must be built before b"
"""
adj = defaultdict(list)
in_degree = {p: 0 for p in projects}
for a, b in dependencies:
adj[a].append(b)
in_degree[b] += 1
queue = deque(p for p in projects if in_degree[p] == 0)
order = []
while queue:
project = queue.popleft()
order.append(project)
for dep in adj[project]:
in_degree[dep] -= 1
if in_degree[dep] == 0:
queue.append(dep)
if len(order) != len(projects):
raise ValueError("Circular dependency detected")
return order
The pattern recognition tell
When you see any of these in a problem statement, your brain should immediately reach for topological sort:
- "tasks/courses/projects with prerequisites"
- "find an order to complete X given constraints"
- "detect a circular dependency"
- "which packages need to be installed first"
- "serialize a sequence of operations with ordering rules"
The give-away is the combination of items + dependency constraints. Any time you model "X must happen before Y" as a directed edge, you're building the graph for a topological sort. The ordering problem and the cycle-detection problem are literally the same graph traversal.
One underused reframe: in system design interviews, when you're asked how a build pipeline or a data pipeline handles task ordering — topological sort is often the right answer to name explicitly. Build systems, Airflow DAGs, Spark execution plans, and database query optimizers all use topological ordering to sequence steps. Knowing the name and O(V + E) cost signals fluency.
Relationship to other graph algorithms
Topological sort is a first-class member of the DFS patterns family. The DFS post-order version is just DFS with a specific action at the finish step. If you're comfortable with DFS, the only new thing to learn is the three-color state tracking and the stack reversal.
It's also related to union-find, but in a different direction. Union-find detects cycles in undirected graphs cheaply. Topological sort (via DFS or Kahn's) detects cycles in directed graphs as a side-effect. They're complements, not substitutes.
And if you need shortest paths through a DAG (not just ordering), you can run a modified Dijkstra's that processes nodes in topological order rather than by distance — this gives you an O(V + E) shortest path for DAGs, faster than the general O((V + E) log V) Dijkstra's. Worth knowing that topological ordering enables that optimization.
For more on the underlying graph structure and terminology, see graphs.
Pitfalls and anti-patterns
Forgetting to handle disconnected components. If the graph has multiple connected components, you need to seed Kahn's with all zero-in-degree nodes (which the template already does), and the DFS loop must iterate over all nodes (also shown in the template). People who only start DFS from node 0 miss nodes in other components.
Confusing the edge direction. "B requires A" means the edge is A → B (A must come first). Getting this backward builds the graph in reverse and gives you a reversed topological order — or, worse, it looks plausible enough that you don't catch it.
Off-by-one in the cycle check. With Kahn's, the cycle check is len(order) == n. If your graph has n nodes and you process fewer than n, some of them couldn't reach zero in-degree — they're cyclic. Don't check len(queue) == 0; the queue empties normally even when there's a cycle.
Treating the topological order as unique. There are usually many valid orderings. If you get a different order than the expected answer but both are valid, you're correct. Some interviewers will specify "lexicographically smallest order" — in that case, replace the deque with a min-heap and push characters instead of using arbitrary FIFO order.
Applying topological sort to undirected graphs. Topological sort is only defined for directed graphs. An undirected graph edge is bidirectional — the concept of "A before B" doesn't apply. If you see an undirected graph, reach for connected components or union-find instead.
Recursive DFS hitting Python's recursion limit. With V up to 10^5, the DFS post-order version can hit Python's default 1000-frame limit on a linear chain graph. Either use Kahn's (iterative by nature) or set sys.setrecursionlimit explicitly if you must use DFS.
When NOT to use topological sort
| You need this | Use this instead |
|---|---|
| Shortest path in a weighted graph | Dijkstra's (non-negative) or Bellman-Ford |
| Connectivity in an undirected graph | BFS/DFS connected components, union-find |
| Detect cycles in an undirected graph | Union-find or DFS without color tracking |
| All topological orderings (not just one) | Backtracking over zero-in-degree nodes |
| Strongly connected components | Kosaraju's or Tarjan's (topological sort of the SCC DAG after) |
| Shortest path through a DAG specifically | Topological sort + DP — this is actually a valid use case |
The last row deserves emphasis. If you need shortest or longest path through a DAG (e.g., critical path in project scheduling), topological order IS your friend — process nodes in topological order and relax edges in that order. You get O(V + E) instead of Dijkstra's O((V + E) log V). That's a real optimization and it comes up in scheduling problems.
But don't reach for topological sort when the graph has undirected edges, might have cycles legitimately (like a road network), or when you need structural properties that BFS/DFS handle more naturally. And definitely don't implement it when the "dependencies" can be circular by design — that's an entirely different problem category. For general graph exploration, graphs covers the terrain.
Frequently asked questions
▸What is topological sort and when do you use it?
Topological sort produces a linear ordering of the vertices of a directed acyclic graph such that for every directed edge u → v, vertex u appears before v in the ordering. You use it whenever you have dependencies: build systems (compile A before B if B imports A), course prerequisites (take Linear Algebra before Machine Learning), package managers (install libssl before anything that links to it). If the graph has a cycle, no topological ordering exists.
▸What is the difference between Kahn's algorithm and DFS-based topological sort?
Both run in O(V + E) and produce a valid topological ordering. Kahn's algorithm is iterative and BFS-flavored: it repeatedly peels off nodes with in-degree zero (nodes with no pending dependencies). DFS post-order is recursive: finish a node only after fully exploring its subtree, then prepend it to the result. Kahn's makes cycle detection obvious (nodes left in the graph at the end form the cycle). DFS post-order is elegant and easier to bolt onto an existing DFS template. In practice Kahn's is more common in interviews because its logic is self-explanatory.
▸How do you detect a cycle in a directed graph with topological sort?
With Kahn's algorithm, if the output list contains fewer nodes than the graph has, the remaining nodes form one or more cycles — they never had their in-degree reach zero. With DFS, track three states per node: unvisited, in-progress (currently on the call stack), and finished. If DFS ever walks onto an in-progress node, you have a back edge — a cycle.
▸Can a graph have multiple valid topological orderings?
Yes, as long as there are nodes with no dependency on each other, they can appear in any relative order. For example, if A → C and B → C, both [A, B, C] and [B, A, C] are valid orderings. Kahn's algorithm gives you one valid ordering based on which zero-in-degree node you process first (usually by queue insertion order). If you want all valid orderings, you'd enumerate them with backtracking.
▸Why can't you topologically sort a graph that has a cycle?
If A depends on B and B depends on A (directly or through a longer chain), there is no order in which you can place them both such that A comes before B and B comes before A simultaneously. The ordering definition breaks down. This is exactly why package managers error on circular dependencies and why build systems flag dependency cycles as configuration errors.
You may also like
Subsets, Permutations & Combinations
Generate every subset, permutation, and combination with the choose/explore/un-choose backtracking template. Understand the 2^n and n! complexity ceilings, why sorting kills duplicate branches, and how to recognize which variant a problem is actually asking for.
Shortest Paths (Dijkstra, Bellman-Ford, BFS)
Pick the right shortest-path algorithm the first time: BFS for unweighted graphs, Dijkstra with a heap for non-negative weights, Bellman-Ford when negative edges appear. Concrete worked examples, a decision table, and every gotcha that costs people the interview.
Fast & Slow Pointers (Cycle Detection)
Two pointers moving at different speeds — the tortoise and hare — detect cycles, find the middle of a linked list, and locate cycle starts, all in O(1) space. The single most-asked linked list technique in FAANG interviews.