Graphs
The data structure that models everything connected — social networks, road maps, dependency chains, and more. Master representations, traversals, and the "model it as a graph" reframe that unlocks whole categories of problems.
Why graphs exist — and why you need them
Here is a problem you will hit eventually: you have a word, "hit", and you want to transform it to "cog" by changing one letter at a time, where every intermediate word must be valid English. What is the fewest number of steps?
That is a shortest-path problem. The vertices are words. Two words share an edge if they differ by exactly one letter. The answer is BFS — and the moment you see that, the problem is solved. You just needed the graph framing to get there.
This is the real reason to study graphs. Not the data structure itself, but the reframe: once you recognize that a problem has vertices and edges hiding in it, a whole toolkit clicks into place. Grids become graphs. Dependency chains become directed acyclic graphs. State-space search becomes a path problem. That pattern recognition is worth more than memorizing any single algorithm.
Directed, undirected, weighted — pick your flavor
Before representations, you need to pick what kind of graph you're dealing with.
Undirected vs. directed. A road network where every street is two-way is undirected — if there is an edge between A and B, you can travel either direction. Twitter follows are directed — you can follow someone who does not follow you back. Directed graphs (digraphs) are more general; undirected graphs are digraphs where every edge runs both ways.
Unweighted vs. weighted. An unweighted graph just records whether a connection exists. A weighted graph attaches a number to each edge — distance in kilometers, latency in milliseconds, cost in dollars. Shortest path in an unweighted graph is BFS (fewest hops). Shortest path in a weighted graph is Dijkstra's or Bellman-Ford.
In code, the difference between directed and undirected is usually one line: when you add an edge (u, v), you either add it one way (directed) or both ways (undirected).
# Undirected — add both directions
graph[u].append(v)
graph[v].append(u)
# Directed — add one direction only
graph[u].append(v)
Representations: the choice that ripples everywhere
This is the decision that shapes everything downstream. You have two main options.
Adjacency list
Store a dictionary (or array) where graph[u] is the list of vertices that u connects to.
# Example: a social graph
graph = {
"alice": ["bob", "carol"],
"bob": ["alice", "dave"],
"carol": ["alice"],
"dave": ["bob"],
}
Space: O(V + E) — you only store what exists. If 10 million users each have an average of 200 connections, that is 2 billion edges, but you only use memory for the edges that are actually there. No wasted space for pairs that are not connected.
Checking whether an edge (u, v) exists requires scanning graph[u], which is O(degree of u) in the worst case. For social graphs where degrees are small, this is fine.
Adjacency matrix
Store a V×V grid. matrix[u][v] = 1 if there is an edge from u to v, 0 otherwise.
# 4-vertex graph: 0↔1, 1↔2, 0↔3
V = 4
matrix = [[0] * V for _ in range(V)]
matrix[0][1] = matrix[1][0] = 1
matrix[1][2] = matrix[2][1] = 1
matrix[0][3] = matrix[3][0] = 1
Space: O(V²) — always, even if only 10 of those V² cells are 1. For 100,000 users, that is 10 billion cells. Not happening.
Edge check: O(1) — matrix[u][v] is a direct array lookup. This is the matrix's one killer advantage.
When to use which
| Adjacency List | Adjacency Matrix | |
|---|---|---|
| Space | O(V + E) | O(V²) |
| Check edge (u, v) | O(degree) | O(1) |
| Iterate all neighbors of u | O(degree of u) | O(V) |
| Best for | Sparse graphs | Dense graphs or small V |
| Typical examples | Social networks, road maps, web graphs | Floyd-Warshall, small game grids |
The honest answer: use an adjacency list unless you have a specific reason not to. Real graphs are almost always sparse — most vertices connect to a tiny fraction of all others. A V×V matrix is wasteful unless V is in the hundreds and you need constant-time edge checks.
flowchart TD
A["Is V small (< ~500)?"] -->|Yes| B["Adjacency matrix is viable"]
A -->|No| C["Adjacency list"]
B --> D["Do you need O(1) edge checks or use a matrix algorithm?"]
D -->|Yes| E["Use matrix"]
D -->|No| C
style C fill:#7c5cff,color:#fff
style E fill:#22d3ee,color:#0a0a0f
Traversal: the engine of everything
Every graph algorithm you care about — shortest path, cycle detection, connected components, topological sort — is built on top of one of two traversals. Understand these and everything else is a variation.
{ "type": "graph", "title": "BFS vs DFS — same graph, different exploration order" }
Step through both modes. BFS visits all nodes at distance 1 before any at distance 2. DFS commits to one path until it hits a dead end, then backtracks. The queue vs. stack data structure is the only implementation difference.
BFS
Use a queue. Process each node, add its unvisited neighbors, repeat.
from collections import deque
def bfs(graph, start):
visited = {start}
queue = deque([start])
order = []
while queue:
node = queue.popleft()
order.append(node)
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return order
BFS naturally gives you shortest paths in an unweighted graph. When you first reach a node via BFS, you have taken the minimum number of edges to get there. This is not true for DFS. That asymmetry — shortest path is BFS's superpower — is the single most important thing to know about choosing between them.
DFS
Use a stack (or recursion, which implicitly uses the call stack).
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
result = [start]
for neighbor in graph[start]:
if neighbor not in visited:
result.extend(dfs(graph, neighbor, visited))
return result
DFS is better for: checking if a path exists between two nodes, detecting cycles, topological sort, finding strongly connected components, and any problem where you need to explore a complete branch before backtracking. Problems where the answer involves the full structure of a path rather than its length lean DFS.
The rule of thumb: fewest hops → BFS; full structure / reachability / cycles → DFS.
The grid-as-graph reframe
One of the highest-leverage insights in graph problems: a 2-D grid is a graph. Each cell is a vertex. Adjacent cells (up/down/left/right) share an edge. You rarely need to build this graph explicitly — you just run BFS or DFS directly using the grid indices.
Number of islands: treat each '1' cell as a vertex. DFS from each unvisited '1', marking everything reachable as visited. Count how many times you start a fresh DFS. Done — you never needed to build an adjacency list.
Shortest path in a binary matrix: BFS from the top-left corner, treating 0-cells as passable. BFS guarantees you find the shortest path in terms of cell count.
def shortest_path(grid):
R, C = len(grid), len(grid[0])
if grid[0][0] == 1 or grid[R-1][C-1] == 1:
return -1
queue = deque([(0, 0, 1)]) # (row, col, path_length)
visited = {(0, 0)}
directions = [(-1,-1),(-1,0),(-1,1),(0,-1),(0,1),(1,-1),(1,0),(1,1)]
while queue:
r, c, dist = queue.popleft()
if r == R - 1 and c == C - 1:
return dist
for dr, dc in directions:
nr, nc = r + dr, c + dc
if 0 <= nr < R and 0 <= nc < C and grid[nr][nc] == 0 and (nr, nc) not in visited:
visited.add((nr, nc))
queue.append((nr, nc, dist + 1))
return -1
The moment you see a grid problem asking for number of islands, connected regions, or shortest path, translate it mentally: cells are vertices, adjacency is edges, and your standard traversal runs unchanged.
Modeling other problems as graphs
The word-ladder example at the top is the canonical "graph hiding in disguise" problem. Here are the other patterns worth recognizing:
Dependency resolution. A package manager needs to install packages in the right order — install A before B if B depends on A. This is a directed acyclic graph (DAG) and the algorithm is topological sort. Cycle in the dependency graph? That is a circular dependency error.
State-space search. A sliding puzzle, a Rubik's cube, or a lock combination problem where you need the fewest moves — the states are vertices, the valid moves are edges, and BFS finds the minimum number of moves. Every state-space search problem is a shortest-path problem.
Network connectivity. Is every router in the network reachable from the gateway? That is a connected-components question, answerable with one BFS or DFS.
flowchart LR
A["Grid problem<br/>(islands, flood fill)"] --> G["Graph: cells = vertices<br/>adjacency = edges"]
B["Dependency problem<br/>(packages, tasks)"] --> G
C["Shortest transformation<br/>(word ladder, puzzle)"] --> G
D["Network / reachability<br/>(connected components)"] --> G
G --> H["BFS or DFS"]
style G fill:#7c5cff,color:#fff
style H fill:#00ff9d,color:#0a0a0f
When you see these shapes, you do not need to reinvent anything — you are just running BFS or DFS on a graph you built from the problem.
A worked example: course schedule (cycle detection)
You have n courses numbered 0 to n-1. Some have prerequisites — [1, 0] means you must take course 0 before course 1. Can you finish all courses?
This is: does the directed graph of prerequisites contain a cycle? If it does, there is a circular dependency and no valid ordering exists.
def can_finish(num_courses, prerequisites):
# Build adjacency list
graph = [[] for _ in range(num_courses)]
for course, prereq in prerequisites:
graph[prereq].append(course)
# 0 = unvisited, 1 = in current DFS path, 2 = done
state = [0] * num_courses
def has_cycle(node):
if state[node] == 1: # back edge — cycle found
return True
if state[node] == 2: # already fully explored
return False
state[node] = 1 # mark as in-progress
for neighbor in graph[node]:
if has_cycle(neighbor):
return True
state[node] = 2 # mark as done
return False
return not any(has_cycle(v) for v in range(num_courses) if state[v] == 0)
The three-state trick is the key insight. One state for "not visited", one for "currently on the DFS stack" (in-progress), and one for "fully explored and safe". If you hit an in-progress node, you found a back edge — a cycle. This runs in O(V + E).
For the actual course ordering (when there is no cycle), you extend this into a full topological sort by appending nodes to a result list after they are fully explored.
Shortest path — the teaser
BFS gives you shortest paths in unweighted graphs. For weighted graphs, you need Dijkstra's algorithm — a BFS-like process that uses a priority queue instead of a regular queue, always expanding the cheapest known path next. It runs in O((V + E) log V) with a min-heap.
Dijkstra fails when edges have negative weights (it assumes you can never improve a path by extending it, which is false with negative weights). For negative edges, Bellman-Ford runs O(VE) and handles them correctly.
These algorithms each deserve their own page — see BFS patterns and DFS patterns for the template code and the interview problems built on each. The crash course introduction to graphs covers the conceptual foundations if you want to back up.
Pitfalls people actually hit
Forgetting to track visited nodes. In a graph with cycles, BFS and DFS without a visited set will spin forever. Always initialize your visited set/dict before the traversal.
Using a list instead of a set for visited. if node in visited_list is O(n). if node in visited_set is O(1). On a graph with a million nodes, this is the difference between fast and timing out.
Assuming the graph is connected. If the problem says "find all connected components" or the graph might be disconnected, you need to run the traversal from every unvisited vertex, not just one.
Conflating undirected cycle detection with directed. In an undirected graph, during DFS, your parent is not a cycle — skip it. In a directed graph, use the three-state approach; the parent-skip trick does not apply.
Rebuilding the adjacency list inside the loop. In Python specifically, if your graph is built from an edge list, build it once before the traversal, not inside a BFS/DFS loop.
Matrix graphs with the wrong neighbor check. Off-by-one on bounds (0 <= r < R and 0 <= c < C), or forgetting to check the cell value before adding to the queue. These are the bugs that eat 20 minutes in an interview.
When graphs are not the right call
Graphs are the right model for relational problems — vertices and edges. They are overkill (and slower) when simpler structures fit:
- Pure key-value lookups → hash table. No need to model relationships.
- Ordered data with fast search → balanced BST or sorted array + binary search.
- Always-need-the-minimum → heap. A priority queue is not a general graph.
- Simple parent-child hierarchy with no cross-links → a tree (which is a special case of a directed acyclic graph, but you do not need the full graph machinery).
The tell that you do need a graph: the data has arbitrary, many-to-many relationships, or paths and reachability are what the problem cares about. Any time you catch yourself saying "this thing connects to multiple other things and I need to find a route or check whether things are reachable," you're looking at a graph.
Master the two representations, internalize the BFS-vs-DFS choice, and drill the grid-as-graph reframe. Those three moves cover the vast majority of graph interview problems you will see at any company.
Frequently asked questions
▸What is the difference between an adjacency list and an adjacency matrix?
An adjacency list stores each vertex alongside the list of its neighbors — space is O(V + E), proportional to what actually exists. An adjacency matrix is a V×V grid where cell [u][v] is 1 if an edge exists — space is always O(V²) regardless of how many edges there are. Use a list when the graph is sparse (most graphs in practice); use a matrix when V is small and you need O(1) edge lookups constantly, or when the algorithm is matrix-based (Floyd-Warshall, for example).
▸What is the difference between directed and undirected graphs?
In an undirected graph, edges go both ways — if you can travel from A to B, you can also travel from B to A, and you store one edge representing both. In a directed graph (digraph), edges have direction — A → B does not imply B → A. Real-world examples: a road network with two-way streets is undirected; a Twitter follow graph or a dependency tree is directed. The distinction matters for both representation (undirected edges appear in two adjacency lists) and algorithms (cycle detection, topological sort only make sense on directed graphs).
▸When should I use BFS vs DFS on a graph?
BFS (breadth-first search) explores layer by layer — it naturally finds the shortest path in an unweighted graph and is the right choice when you need minimum hops or want to find the closest matching node. DFS (depth-first search) commits to one branch before backtracking — it is better for detecting cycles, topological ordering, strongly connected components, and problems where you need to explore full paths. For most graph interview problems, the first question is: does the problem care about shortest path (reach for BFS) or full reachability / structure (reach for DFS)?
▸How do you detect a cycle in a graph?
In an undirected graph, run DFS and track the parent of each vertex. If you reach a neighbor that is already visited and is not the parent you came from, you have a cycle. In a directed graph, track three states per vertex — unvisited, in the current DFS stack, and done. If you hit a vertex that is currently on the stack, you have a cycle. This is also the foundation of topological sort: a directed graph can be topologically ordered if and only if it has no cycles (it is a DAG).
▸How do you solve grid problems as graph problems?
A 2-D grid is a graph where each cell is a vertex, and two cells share an edge if they are adjacent (typically 4-directional, sometimes 8-directional). You do not need to build the graph explicitly — just run BFS or DFS directly on the grid using a visited matrix or by mutating the grid in place. Classic examples: number of islands, shortest path in a binary matrix, flood fill, word search. The graph framing makes the algorithm choice (BFS for shortest path, DFS for connected components) immediately obvious.
You may also like
Union-Find (Disjoint Set Union)
The near-magic data structure for connectivity queries — are these two nodes in the same component? Nearly O(1) per operation with union by rank and path compression. Kruskal's MST, redundant connections, account merging, and more.
Tries (Prefix Trees)
The data structure powering autocomplete, spell-check, and IP routing — a tree where each edge is a character and every path from the root spells a prefix. Insert, search, and prefix queries all run in O(L) on word length alone, completely independent of how many words you store.
Strings
Strings look like simple text, but they hide a brutal trap: naive concatenation in a loop is O(n²). Learn why, how encoding actually works, and the handful of patterns — sliding window, two pointers, hashing — that solve 80% of string interview problems.