◆◆◆Advancedasked at Googleasked at Amazonasked at Uberasked at Meta

Shortest Paths (Dijkstra, Bellman-Ford, BFS)

Pick the right shortest-path algorithm the first time: BFS for unweighted graphs, Dijkstra with a heap for non-negative weights, Bellman-Ford when negative edges appear. Concrete worked examples, a decision table, and every gotcha that costs people the interview.

13 min read2026-01-20Ironclad Academy

#graphs #shortest-path #dijkstra #bellman-ford #bfs #algorithms

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The decision you actually have to make

Here's the moment most people get wrong: they reach for Dijkstra by default because it's the famous one. That's fine if weights exist and are non-negative. But the question to ask first is simpler.

Does this graph even have meaningful weights?

If you're counting hops, steps, connections, or any scenario where every edge costs the same, BFS is strictly better. It's O(V + E), it's simpler to implement, and it gives you exactly the shortest path with no heap overhead. "Shortest path in an unweighted grid" is a BFS problem. "Minimum number of word changes in word ladder" is a BFS problem. Don't overcomplicate it.

flowchart TD
    START["Shortest path problem"] --> Q1{"Edge weights?"}
    Q1 -->|"No / all equal"| BFS["BFS — O(V + E)"]
    Q1 -->|"Yes"| Q2{"Negative weights?"}
    Q2 -->|"No"| Q3{"Hop count constraint?"}
    Q3 -->|"No"| DIJK["Dijkstra + heap — O(E log V)"]
    Q3 -->|"Yes (K stops)"| BF2["Bellman-Ford, K iterations"]
    Q2 -->|"Yes"| Q4{"Need cycle detection?"}
    Q4 -->|"No"| BF["Bellman-Ford — O(V · E)"]
    Q4 -->|"Yes"| BFC["Bellman-Ford + V-th pass check"]
    DIJK -->|"Have geometry / heuristic?"| ASTAR["Consider A*"]
    style BFS fill:#00ff9d,color:#0a0a0f
    style DIJK fill:#7c5cff,color:#fff
    style BF fill:#22d3ee,color:#0a0a0f
    style BF2 fill:#22d3ee,color:#0a0a0f
    style BFC fill:#22d3ee,color:#0a0a0f
    style ASTAR fill:#7c5cff,color:#fff

Commit this flowchart to memory. The rest of this article is just the mechanics behind each branch.

BFS: the shortest path you already know

If you've read the BFS patterns article, you already know this. BFS visits nodes in exact wave-fronts — all nodes at distance 1 before any at distance 2. That ordering property is why BFS finds shortest paths for free: the first time you reach the destination, you took the fewest hops possible.

from collections import deque

def bfs_shortest_path(graph, start, end):
    """Returns minimum hops from start to end. graph is adjacency list."""
    queue = deque([(start, 0)])    # (node, distance)
    visited = {start}

    while queue:
        node, dist = queue.popleft()
        if node == end:
            return dist
        for neighbor in graph[node]:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, dist + 1))

    return -1  # unreachable

One queue, one visited set, dead simple. The key: mark nodes visited when you enqueue them, not when you dequeue them. Marking on dequeue lets duplicate entries pile up in the queue, wasting work.

BFS gives you wrong answers the moment edge weights differ. If the edge A→B costs 5 and A→C→B costs 1+1=2, BFS will tell you the shortest path to B has length 1 hop (via A→B) while totally missing the 2-hop path that's actually cheaper. That's when you need Dijkstra.

Dijkstra: greedy expansion with a heap

Dijkstra's insight is beautifully simple: at each step, expand the node you're most confident about — the one with the smallest known cumulative distance. Use a min-heap to find that node in O(log V).

Why greediness works here (and why negative edges break it)

When all weights are non-negative, the heap gives you a monotonically non-decreasing sequence of distances. Once you pop a node at distance d, no future path to that node can cost less than d — because any path going through unvisited nodes would add non-negative weight to d.

Negative edges shatter this guarantee. Suppose you finalize node B at distance 5, then later discover an edge X→B with weight -10 that reduces B's distance to -3. Dijkstra has already committed — it already processed B and ignored all updates to it. The answer is wrong and you won't know.

This isn't a "known limitation" — it's a fundamental correctness failure. If your problem has negative weights, Dijkstra is simply the wrong algorithm.

The relax-edges core

The operation at the heart of every shortest-path algorithm is relaxation: given the current known distance to node u and an edge u → v with weight w, check if going through u gives v a shorter path.

if dist[u] + w < dist[v]:
    dist[v] = dist[u] + w   # relax

That's it. Dijkstra and Bellman-Ford differ only in which edges they relax and when.

Heap-based Dijkstra — the implementation that actually works

import heapq

def dijkstra(graph, start, n):
    """
    graph: adjacency list as {node: [(neighbor, weight), ...]}
    start: source node (0-indexed)
    n:     total number of nodes
    Returns: dist[] where dist[v] = shortest distance from start to v
    """
    dist = [float('inf')] * n
    dist[start] = 0
    heap = [(0, start)]          # (cumulative_cost, node)

    while heap:
        cost, u = heapq.heappop(heap)

        # Lazy deletion: if we've already found a better path, skip
        if cost > dist[u]:
            continue

        for v, w in graph[u]:
            new_cost = cost + w
            if new_cost < dist[v]:
                dist[v] = new_cost
                heapq.heappush(heap, (new_cost, v))

    return dist

The if cost > dist[u]: continue line is lazy deletion — the practical alternative to a heap with decrease-key. When we discover a shorter path to v and push the new (new_cost, v) entry, the old stale entry is still in the heap. We don't remove it (heapq has no O(log n) remove). We let it get popped eventually and just skip it. The heap can hold O(E) entries in the worst case, which is why the time complexity is O(E log E) — and since E ≤ V², log E = O(log V), so it's O(E log V) either way.

The heap article covers the sift-up and sift-down mechanics if you want to understand why each push/pop costs O(log n).

Bellman-Ford: relax everything, V-1 times

Dijkstra is fast because it's greedy. Bellman-Ford is correct because it's thorough. It processes every edge, V-1 times.

Why V-1? A shortest path in a graph with V nodes visits at most V-1 edges (any more and you've revisited a node, i.e., a cycle). After k iterations of Bellman-Ford, you've found the shortest paths using at most k edges. By the (V-1)th iteration, you're done — unless there's a negative cycle.

def bellman_ford(edges, start, n):
    """
    edges: list of (u, v, weight)
    start: source node
    n:     total number of nodes
    Returns: dist[], or None if negative cycle exists
    """
    dist = [float('inf')] * n
    dist[start] = 0

    # Relax all edges V-1 times
    for _ in range(n - 1):
        updated = False
        for u, v, w in edges:
            if dist[u] != float('inf') and dist[u] + w < dist[v]:
                dist[v] = dist[u] + w
                updated = True
        if not updated:
            break   # converged early, no need to continue

    # V-th iteration: if anything still relaxes, there's a negative cycle
    for u, v, w in edges:
        if dist[u] != float('inf') and dist[u] + w < dist[v]:
            return None  # negative cycle detected

    return dist

O(V · E) is genuinely slower than Dijkstra's O(E log V), but it's the price for correctness with negative edges. For the problem sizes you see in interviews (V ≤ 100, E ≤ 500), it's completely fine.

One optimization worth noting: the early-break check (if not updated: break) makes Bellman-Ford terminate early when the graph converges. In practice, on many graphs it finishes well before V-1 iterations.

Worked problems

Problem 1: Network Delay Time (Dijkstra)

Given a network of n nodes and a list of directed travel times (u, v, w), find the time it takes for a signal sent from node k to reach all nodes. Return -1 if any node is unreachable.

This is a textbook single-source shortest path with non-negative weights. Dijkstra.

import heapq
from collections import defaultdict

def network_delay_time(times, n, k):
    graph = defaultdict(list)
    for u, v, w in times:
        graph[u].append((v, w))

    dist = {i: float('inf') for i in range(1, n + 1)}
    dist[k] = 0
    heap = [(0, k)]

    while heap:
        cost, u = heapq.heappop(heap)
        if cost > dist[u]:
            continue
        for v, w in graph[u]:
            if cost + w < dist[v]:
                dist[v] = cost + w
                heapq.heappush(heap, (dist[v], v))

    max_dist = max(dist.values())
    return max_dist if max_dist < float('inf') else -1

The answer is the maximum value in dist[] — the time when the last node finally receives the signal. If any dist[v] is still inf, a node is unreachable.

Complexity: O(E log V). At most E pushes to the heap, each costing O(log E) = O(log V).

Problem 2: Cheapest Flights Within K Stops (Bellman-Ford, fixed iterations)

Given n cities, flights as (from, to, price), find the cheapest path from src to dst using at most K stops (K+1 edges). Return -1 if no such path exists.

The K-stops constraint is the tell. Dijkstra doesn't care about how many edges a path uses — once it finalizes a node's distance, it can't constrain the path structure. Bellman-Ford's iteration-based structure fits naturally: run exactly K+1 iterations to find paths using at most K+1 edges.

The critical implementation detail: you must use a copy of the previous iteration's distances when relaxing, or a single path can "chain" across multiple edges in one iteration — violating the K-stops constraint.

def find_cheapest_price(n, flights, src, dst, k):
    INF = float('inf')
    dist = [INF] * n
    dist[src] = 0

    for i in range(k + 1):  # k stops = k+1 edges
        temp = dist[:]       # snapshot: relaxations within one pass can only use previous-pass distances
        for u, v, price in flights:
            if dist[u] != INF and dist[u] + price < temp[v]:
                temp[v] = dist[u] + price
        dist = temp

    return dist[dst] if dist[dst] != INF else -1

Why temp = dist[:]? Without it, a chain u→v→w could both relax in the same pass if u comes before v in the flights list. That would mean a single Bellman-Ford iteration covers two edges, so K=1 could actually traverse 2 or more edges. The snapshot enforces strict edge-count accounting.

Complexity: O(K · E). K iterations, each scanning all E flights.

Problem 3: Path With Minimum Effort (Dijkstra on a grid)

A 2D grid of heights. Moving between adjacent cells costs |height difference|. Find the path from top-left to bottom-right that minimizes the maximum single-step effort along the path.

This one has a fun twist: the "distance" you're minimizing is the max edge weight on the path, not the sum. But Dijkstra still works — you just redefine what goes in the heap.

import heapq

def minimum_effort_path(heights):
    rows, cols = len(heights), len(heights[0])
    effort = [[float('inf')] * cols for _ in range(rows)]
    effort[0][0] = 0
    heap = [(0, 0, 0)]   # (max_effort_so_far, row, col)

    directions = [(0,1),(0,-1),(1,0),(-1,0)]

    while heap:
        cost, r, c = heapq.heappop(heap)
        if cost > effort[r][c]:
            continue
        if r == rows - 1 and c == cols - 1:
            return cost
        for dr, dc in directions:
            nr, nc = r + dr, c + dc
            if 0 <= nr < rows and 0 <= nc < cols:
                step = abs(heights[nr][nc] - heights[r][c])
                new_cost = max(cost, step)   # minimax: take the max effort on this path
                if new_cost < effort[nr][nc]:
                    effort[nr][nc] = new_cost
                    heapq.heappush(heap, (new_cost, nr, nc))

    return effort[rows-1][cols-1]

The graph article covers the grid-as-graph mental model — each cell is a node, each adjacent cell is an edge. Here, every edge weight is the absolute height difference. We're running Dijkstra where "distance" means "maximum single-step cost," and the heap still correctly expands the cell with the current smallest max-effort.

Complexity: O(rows × cols × log(rows × cols)). Every cell is a node, every edge is an adjacency.

A word on A*

A* is Dijkstra with a heuristic. Instead of expanding nodes in order of g (cost from source), A* uses f = g + h where h is a heuristic estimate of the cost to the goal.

When h is admissible (never overestimates), A* is optimal and often dramatically faster than Dijkstra — it focuses exploration toward the goal. The classic h for map problems is straight-line (Euclidean) distance: you can't do better than flying in a straight line.

In interview contexts: if the problem gives you coordinates and asks for shortest path on a map, mention A* as an optimization. For everything else, Dijkstra is simpler and always correct. A* is a production concern (game pathfinding, Google Maps routing) more than a core interview topic.

The pitfalls people actually hit

Forgetting to mark nodes visited before enqueuing in BFS. In shortest-path BFS on a grid, if you add the neighbor to the queue but mark it visited only when you dequeue it, every cell can be enqueued multiple times. On a 1000×1000 grid that's the difference between O(n²) and O(n⁴). Mark on enqueue.

Using Dijkstra with negative weights and not catching the wrong answer. This is insidious because Dijkstra won't throw an exception — it'll return a plausible-looking but wrong answer. In an interview, if the problem says "weights can be negative," explicitly say "I'd use Bellman-Ford here because negative edges break Dijkstra's greedy invariant."

Not copying dist in the K-stops Bellman-Ford. The temp-copy pattern is easy to forget under pressure and produces a subtle bug: paths relax across too many edges in one pass, so you might find a "K=1 stops" solution that actually uses 5 edges. Always snapshot before the inner loop.

Treating directed graphs as undirected. If flights go from A to B, that doesn't mean you can fly from B to A. Build the adjacency list with directed edges only. A very common source of wrong answers.

Re-pushing stale entries without lazy deletion. In Python's heapq there's no decrease-key, so you push new (cost, node) pairs without removing the old ones. If you forget the if cost > dist[u]: continue guard, you'll process nodes multiple times and potentially return wrong answers or TLE.

Initializing dist[start] = 0 but forgetting to add start to the heap. Then the heap is empty immediately and you return all-inf distances. Obvious in testing, catastrophic on a timed submission.

When NOT to use these algorithms

Situation	What to use instead
You need shortest paths between ALL pairs of nodes	Floyd-Warshall — O(V³) but one pass for all pairs
Graph is a DAG and you want shortest/longest path	Topological sort + DP — O(V + E), no heap needed
You want any path, not the shortest	Plain DFS — O(V + E), much simpler
All edges have weight 1 and you want fewest hops	BFS patterns — O(V + E), no heap needed
Minimum spanning tree (not a path)	Prim's or Kruskal's — different problem entirely

The last one trips people up. "Find the minimum cost to connect all nodes" is a minimum spanning tree problem, not shortest path. Dijkstra and Bellman-Ford find paths, not trees. The algorithms look structurally similar but solve different things.

Complexity table

Algorithm	Time	Space	Notes
BFS (unweighted)	O(V + E)	O(V)	Level-order expansion; see BFS patterns
Dijkstra (heap)	O(E log V)	O(V + E)	Lazy deletion; heap holds up to E entries
Dijkstra (array)	O(V²)	O(V)	Better for dense graphs where E ≈ V²
Bellman-Ford	O(V · E)	O(V)	V-th pass detects negative cycles
Bellman-Ford (K iters)	O(K · E)	O(V)	Cheapest flights pattern
Floyd-Warshall	O(V³)	O(V²)	All-pairs; adjacency matrix

For interview problems, you're almost always in either the Dijkstra bucket (weighted graph, no constraints on hops) or the Bellman-Ford bucket (negative weights, or hop-count constraint). Get both templates into muscle memory. The decision is fast once you see the edge-weight structure.

The heap article is worth revisiting if the heap mechanics feel shaky — the O(log n) push and pop that make Dijkstra work are just sift-up and sift-down on an array-backed binary tree. And if you're still internalizing the graph model itself, graphs covers adjacency lists, directed vs. undirected, and weighted representations in detail.

// FAQ

Frequently asked questions

▸When should I use BFS vs Dijkstra for shortest paths?

BFS is sufficient — and faster — when all edges have the same weight (or no weight at all). It visits nodes in strict hop-count order and finds the shortest path in O(V + E). Dijkstra handles the case where edges have different non-negative weights; it uses a min-heap to always expand the currently cheapest node, giving O(E log V). If every edge has weight 1, BFS and Dijkstra give the same answer but BFS is simpler and faster.

▸Why does Dijkstra break on negative-weight edges?

Dijkstra's greedy assumption is: once a node's shortest distance is finalized (it's been popped from the heap), no shorter path to it will ever be found. Negative edges violate this — a longer-seeming path that goes through a negative edge could later prove cheaper. The algorithm may finalize a node's distance too early. If negative edges are possible, use Bellman-Ford (which relaxes all edges V-1 times, catching any valid path regardless of sign) or re-weight edges if you control the graph.

▸What is the time complexity of Dijkstra's algorithm?

With an adjacency list and a binary min-heap (or Python heapq), Dijkstra runs in O((V + E) log V), which is often written as O(E log V) for connected graphs where E ≥ V. Each edge triggers at most one heap push (O(log V)) and each vertex is popped once. The original array-based Dijkstra is O(V²) — fine for dense graphs but worse for sparse ones.

▸What is the Bellman-Ford algorithm and when do I need it?

Bellman-Ford relaxes every edge V-1 times. After k iterations, it has found all shortest paths that use at most k edges. Running a V-th iteration and checking for further relaxation detects negative cycles. Use Bellman-Ford when: (1) edges can have negative weights, (2) you need to detect negative cycles, or (3) the problem caps the number of edges in the path (like 'cheapest flights within K stops' — just run K+1 iterations of Bellman-Ford).

▸What is A* and how is it different from Dijkstra?

A* is Dijkstra with a heuristic. Instead of expanding the node with the smallest known distance (g), A* expands the node with the smallest g + h, where h is a heuristic estimate of the remaining distance to the goal. If the heuristic is admissible (never overestimates), A* finds the optimal path, often visiting far fewer nodes. Classic uses: pathfinding on maps where Euclidean distance is a good heuristic. In most interview contexts Dijkstra is sufficient; A* comes up in game dev and robotics.

← previous

Subsets, Permutations & Combinations

Prefix Sums

// RELATED