BFS Patterns
Queue-driven level-by-level traversal and why breadth-first search is the only guaranteed way to find shortest paths in unweighted graphs. Templates, traps, and four worked problems.
Why the queue makes it work
Here's a thing a lot of people memorize without understanding: BFS and DFS use different data structures, and the data structure IS the algorithm. Swap the queue for a stack and BFS becomes DFS without changing any other line of code.
Why? A queue is FIFO. When you enqueue a node's neighbors, they go to the back of the line — behind all the other nodes that are already waiting from this level. So you finish the current ring before starting the next. A stack is LIFO. The most recently discovered node jumps to the front, which sends you diving deep before you've finished exploring the current level. Same graph traversal skeleton, totally different behavior.
This is also why BFS finds shortest paths and DFS does not. When you reach the destination for the first time in BFS, every earlier dequeue was a node at a smaller distance. You literally could not have arrived via a shorter route — those nodes would have been dequeued first and found the destination earlier.
DFS might find the destination on its first dive, but that dive could be an arbitrarily long detour. There is no distance guarantee.
The canonical template
Commit this to muscle memory. Every BFS problem is a variation of this.
from collections import deque
def bfs(graph, start, target):
queue = deque([start])
visited = {start}
level = 0 # distance from start
while queue:
# Process one full level at a time
for _ in range(len(queue)): # snapshot current level size
node = queue.popleft()
if node == target:
return level # shortest path length
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
level += 1
return -1 # target unreachable
Three things worth calling out:
visited on enqueue, not on dequeue. A common bug is marking nodes visited when you pop them. In a dense graph, that lets you enqueue the same node dozens of times before you ever process it — O(E) redundant entries in the queue instead of O(V). Mark it when you add it.
len(queue) snapshot. The for _ in range(len(queue)) loop processes exactly the nodes at the current level, even as we're appending the next level's nodes to the same queue. It works because len(queue) is evaluated once, at the start of the loop. Clean, no extra data structure needed.
deque, not list. list.pop(0) in Python is O(n) — it shifts every remaining element. deque.popleft() is O(1). Always use deque for BFS. This is one of those bugs that doesn't matter on a 10-node test case and kills you on a 100,000-node input.
Problem 1: Shortest path in a grid
Classic setup: you have a 2D grid of 0s (open) and 1s (walls). Find the shortest path from the top-left corner to the bottom-right corner, moving in four directions.
The key insight — say it out loud: a grid is a graph. Each cell (r, c) is a node. Two cells are connected by an edge if they're adjacent and neither is a wall. BFS on this graph gives you the shortest path in O(rows × cols).
def shortest_path_grid(grid):
rows, cols = len(grid), len(grid[0])
# Edge cases
if grid[0][0] == 1 or grid[rows-1][cols-1] == 1:
return -1
queue = deque([(0, 0)])
visited = {(0, 0)}
steps = 1 # count the start cell
while queue:
for _ in range(len(queue)):
r, c = queue.popleft()
if r == rows - 1 and c == cols - 1:
return steps
for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
nr, nc = r + dr, c + dc
if (0 <= nr < rows and 0 <= nc < cols
and grid[nr][nc] == 0
and (nr, nc) not in visited):
visited.add((nr, nc))
queue.append((nr, nc))
steps += 1
return -1
That four-direction neighbor loop (dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]) is a pattern you'll write so many times it becomes automatic. Eight directions? Just add the four diagonal pairs.
Problem 2: Word Ladder
You have a beginWord, an endWord, and a dictionary. Each step you change exactly one letter; every intermediate word must be in the dictionary. Find the minimum number of transformations.
This one trips people up because the graph is implicit — no adjacency list is given to you. You have to build it on the fly. Two words are neighbors if they differ by exactly one character.
def word_ladder(begin_word, end_word, word_list):
word_set = set(word_list)
if end_word not in word_set:
return 0
queue = deque([begin_word])
visited = {begin_word}
transformations = 1
while queue:
for _ in range(len(queue)):
word = queue.popleft()
if word == end_word:
return transformations
# Generate all words one letter away
for i in range(len(word)):
for ch in 'abcdefghijklmnopqrstuvwxyz':
candidate = word[:i] + ch + word[i+1:]
if candidate in word_set and candidate not in visited:
visited.add(candidate)
queue.append(candidate)
transformations += 1
return 0
The neat move: instead of comparing every pair of words (O(N²) to build the graph), you generate all single-letter mutations of the current word and check if each one is in the dictionary. For a 5-letter word that's 5 × 26 = 130 candidates — much better than scanning the whole word list every time.
Time complexity: O(N × L²) where N is the dictionary size and L is the word length. The L² comes from the string slicing inside the inner loop.
Problem 3: Rotting Oranges (Multi-Source BFS)
Grid of cells: 0 (empty), 1 (fresh orange), 2 (rotten orange). Rotten oranges spread to adjacent fresh oranges each minute. Find the minimum minutes until no fresh orange remains, or return -1 if impossible.
Here's the twist: you have multiple starting points, and they all spread simultaneously. This is multi-source BFS — you seed the queue with every rotten orange at minute 0, then let BFS run. Each level represents one minute of spreading.
def oranges_rotting(grid):
rows, cols = len(grid), len(grid[0])
queue = deque()
fresh = 0
# Seed queue with ALL rotten oranges simultaneously
for r in range(rows):
for c in range(cols):
if grid[r][c] == 2:
queue.append((r, c))
elif grid[r][c] == 1:
fresh += 1
if fresh == 0:
return 0
minutes = 0
while queue:
for _ in range(len(queue)):
r, c = queue.popleft()
for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and grid[nr][nc] == 1:
grid[nr][nc] = 2 # mark rotten — grid IS the visited set
fresh -= 1
queue.append((nr, nc))
minutes += 1
return minutes - 1 if fresh == 0 else -1
A few things to notice here. First: we're mutating the grid to track visited cells instead of a separate set. That works when you own the grid. Second: the answer is minutes - 1 because the last level dequeue increments minutes once more after the final wave. Easy to get this off by one. Third — and this is the whole point of multi-source BFS — the "which rotten orange spreads first" question doesn't matter. They all spread in lockstep. Starting all of them in the queue at minute 0 handles this perfectly.
If you instead ran single-source BFS from each rotten orange and took the max, you'd get a wrong answer — because a fresh orange could be reachable from multiple rotten sources, and you'd be overcounting.
Problem 4: Level-order traversal on a tree
BFS isn't only for general graphs. Level-order (breadth-first) tree traversal is one of the most common interview patterns, and BFS is the natural implementation — DFS with recursion can do it, but you need a hashmap keyed by depth, which is messier.
from collections import deque
def level_order(root):
if not root:
return []
result = []
queue = deque([root])
while queue:
level_nodes = []
for _ in range(len(queue)):
node = queue.popleft()
level_nodes.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
result.append(level_nodes)
return result
This pattern also directly solves "find the minimum depth of a binary tree" (return level the first time you hit a leaf), "right side view" (take the last element of each level), and "zigzag level order" (alternate reversing each level). All the same skeleton, tiny variations.
When BFS beats DFS
flowchart TD
Q{"What do\nyou need?"}
Q -->|"Shortest path\n(fewest hops)"| BFS["Use BFS"]
Q -->|"Level-by-level\nprocessing"| BFS
Q -->|"Wide graph,\nshallow answer"| BFS
Q -->|"All paths /\nfull exploration"| DFS["Use DFS"]
Q -->|"Cycle detection,\nSCCs"| DFS
Q -->|"Tree structure,\nrecursion feels natural"| DFS
Q -->|"Topological\nsorting"| TOPO["Kahn's BFS or\nDFS post-order"]
style BFS fill:#7c5cff,color:#fff
style DFS fill:#22d3ee,color:#0a0a0f
style TOPO fill:#00ff9d,color:#0a0a0f
A few concrete scenarios where BFS wins clearly:
Shortest path in unweighted graphs. This is BFS's defining advantage. If there are weights involved, reach for Dijkstra's instead. But if edges are unweighted (or all weight-1), BFS is Dijkstra's — just cheaper and simpler.
Finding anything "nearest." Nearest exit, nearest gate, nearest rotten orange. Especially with multi-source BFS. The wave-front expanding from all sources simultaneously gives you the "minimum over all sources" distance for every cell in one pass.
"Minimum steps / transformations / moves." Whenever a problem asks for the minimum number of steps to get from state A to state B, and each step is uniform-cost, it's BFS in disguise. The word ladder problem is classic. "Minimum knight moves on a chessboard," "shortest sliding puzzle solution" — same idea.
DFS wins when you're doing something structural with the graph — connected components, cycle detection, topological sort, finding all paths, backtracking. For those, see DFS patterns.
Complexity
| Scenario | Time | Space |
|---|---|---|
| BFS on a graph (adj. list) | O(V + E) | O(V) |
| BFS on a grid (R × C) | O(R × C) | O(R × C) |
| Word ladder (N words, length L) | O(N × L²) | O(N × L) |
| Multi-source BFS | O(V + E) | O(V) |
| Level-order tree traversal | O(N) | O(W) — max width |
The space bound for grids is the full O(R × C) in the worst case: think of a grid with no walls where BFS enqueues roughly half the cells at peak. Wide, open spaces are where BFS memory usage hurts. Contrast with DFS, whose stack depth is at most the longest path — better for tall, narrow graphs, worse for flat ones.
Anti-patterns and gotchas
Marking visited on dequeue. If you wait until you pop a node to mark it visited, you can enqueue it multiple times in the meantime. A node with 10 incoming edges gets enqueued 10 times. On a dense graph or grid, this is the difference between passing and TLE.
Using a list for the queue. list.pop(0) in Python shifts all elements — it's O(n) per operation, making your O(V + E) algorithm silently O(V²). Always from collections import deque.
Off-by-one on level counting. Track whether you increment level before or after processing the level. The range(len(queue)) snapshot pattern avoids this — increment level at the end of the outer while loop, after all nodes in the current wave are dequeued.
Forgetting disconnected components. If the problem asks about all nodes but the graph might be disconnected, run BFS from every unvisited node, not just from node 0. Same fix as DFS: outer loop over all nodes, inner BFS.
Modifying graph[node] while iterating it. Rare, but it happens when the graph is implicit and you're building neighbors on the fly. Store them in a list first.
The visited set is a design decision
For grid problems, you can often mark cells in-place (flip 0 to -1, or to the step count). This saves the overhead of a hash set — useful when you're fighting constant-factor time limits. The trade-off: it mutates the input, which matters if the problem asks you to leave the grid unchanged or run BFS multiple times.
For general graphs, use a proper set. And for problems where you need to reconstruct the actual path (not just its length), store a parent dictionary as you go: parent[neighbor] = current_node, then walk it backwards from the target.
Where to go next
BFS is one half of a coin. DFS patterns is the other — and many problems can be solved with either, so understanding both lets you choose deliberately. The underlying data structures are worth knowing cold: graphs covers adjacency lists, matrices, and how graph density affects your choice, and queues digs into the FIFO mechanics that make BFS tick.
If you want to extend BFS to weighted graphs, that's Dijkstra's algorithm — BFS with a priority queue instead of a plain queue. When edges have non-uniform costs, the wave-front ordering breaks unless you reorder by cumulative distance. That's a natural next step once this template feels automatic.
Frequently asked questions
▸Why does BFS guarantee the shortest path in an unweighted graph?
BFS visits nodes in strict order of their distance from the source — first all nodes one hop away, then all nodes two hops away, and so on. Because you expand the closest nodes first, the very first time you reach the destination you have taken the fewest possible edges to get there. DFS makes no such guarantee; it can dive deep down a long path before finding the target.
▸What data structure does BFS use, and why?
BFS uses a queue (FIFO). You enqueue the starting node, then repeatedly dequeue a node, process it, and enqueue its unvisited neighbors. The FIFO order is what enforces level-by-level expansion — everything one hop away gets processed before anything two hops away. Using a stack instead gives you DFS.
▸How do you track levels in BFS?
Two common ways. The cleaner approach: at the start of each level, snapshot the current queue length, then process exactly that many nodes. Everything dequeued in that batch is at the same depth. The other approach: store (node, depth) pairs in the queue. Both are O(V + E) and correct; the snapshot method is shorter to write.
▸What is multi-source BFS and when do you use it?
Multi-source BFS starts with multiple nodes already in the queue — as if you had a virtual super-source connected to all of them. You use it when you want the shortest distance from the nearest of several sources to each cell, rather than from one specific source. Classic examples: rotting oranges (all rotten oranges spread simultaneously), walls-and-gates (nearest gate for every empty room).
▸When should I use BFS instead of DFS?
Use BFS when shortest path (fewest hops) matters, when you want level-by-level processing, or when the graph is wide with a shallow answer. Use DFS when you need to explore all paths, detect cycles, compute connected components, or work on trees where recursion is natural. For the interview version of "find the shortest path" the answer is almost always BFS.
You may also like
Topological Sort
Order the nodes of a DAG so every edge points forward — the algorithm that drives build systems, package managers, course schedulers, and anything else that lives and dies by dependency ordering.
Shortest Paths (Dijkstra, Bellman-Ford, BFS)
Pick the right shortest-path algorithm the first time: BFS for unweighted graphs, Dijkstra with a heap for non-negative weights, Bellman-Ford when negative edges appear. Concrete worked examples, a decision table, and every gotcha that costs people the interview.
DFS Patterns
Depth-first search is the backbone of cycle detection, flood fill, path enumeration, and clone graph — master the visited-set template, the 3-color trick, and when to reach for DFS over BFS.