Dynamic Programming for Beginners
Dynamic programming demystified — learn how overlapping subproblems and optimal substructure turn exponential brute-force recursion into clean O(n) solutions, starting from Fibonacci.
The moment recursion becomes a problem
Here is the naive Fibonacci in Python. Take ten seconds and actually read it — this is the starting point for everything that follows.
def fib(n):
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
For fib(50) your laptop will sit there for minutes. For fib(100) it will not finish in your lifetime. But the formula is correct — the flaw is purely in execution.
Why is it so slow? Every call to fib(n) spawns two more calls. Those two each spawn two more. The call tree doubles at every level, so it has roughly 2^n nodes. fib(50) triggers roughly 2^50 ≈ 1 quadrillion function calls. All to compute a single number that fits in 64 bits.
The brutal irony: most of those calls are duplicates. To compute fib(5):
flowchart TD
F5["fib(5)"] --> F4["fib(4)"] & F3a["fib(3)"]
F4 --> F3b["fib(3)"] & F2a["fib(2)"]
F3a --> F2b["fib(2)"] & F1a["fib(1)"]
F3b --> F2c["fib(2)"] & F1b["fib(1)"]
F2a --> F1c["fib(1)"] & F0a["fib(0)"]
F2b --> F1d["fib(1)"] & F0b["fib(0)"]
F2c --> F1e["fib(1)"] & F0c["fib(0)"]
style F3a fill:#7c5cff,color:#fff
style F3b fill:#7c5cff,color:#fff
style F2a fill:#22d3ee,color:#0a0a0f
style F2b fill:#22d3ee,color:#0a0a0f
style F2c fill:#22d3ee,color:#0a0a0f
fib(3) is computed twice. fib(2) is computed three times. Every purple node is wasted work. This is what overlapping subproblems looks like on paper — and why DP exists.
Step one: memoization (top-down DP)
The fix is almost offensively simple. Before computing fib(n), check if you already have the answer. If you do, return it immediately. If you don't, compute it, save it, then return.
def fib_memo(n, cache={}):
if n in cache:
return cache[n] # already solved — free lookup
if n <= 1:
return n
result = fib_memo(n - 1, cache) + fib_memo(n - 2, cache)
cache[n] = result # store before returning
return result
That is memoization. The name comes from "memo" — you're jotting down answers so you don't rethink them. Three lines changed (check, store, return) and you go from O(2^n) to O(n).
Why O(n)? Because there are only n distinct inputs: fib(0), fib(1), ..., fib(n). With the cache, each one is computed exactly once. The recursion tree that used to have 2^n nodes now collapses into a DAG with n nodes, each visited once.
In Python, the idiomatic way is functools.lru_cache, which handles the cache dictionary for you:
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
Same logic, no manual cache management. fib(1000) returns instantly.
Step two: tabulation (bottom-up DP)
Memoization is top-down — you start at the big problem and let recursion pull in the subproblems it needs. Tabulation flips the direction: start at the smallest subproblems and build your way up to the answer.
def fib_table(n):
if n <= 1:
return n
dp = [0] * (n + 1) # dp[i] = fib(i)
dp[0] = 0
dp[1] = 1
for i in range(2, n + 1):
dp[i] = dp[i - 1] + dp[i - 2]
return dp[n]
Walk through this for n = 6:
| i | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| dp[i] | 0 | 1 | 1 | 2 | 3 | 5 | 8 |
Each cell depends only on the two cells before it. No recursion, no call stack, no risk of stack overflow on large inputs. Just a loop.
And you can do even better on space — since dp[i] only needs the previous two values, you can throw away the table entirely:
def fib_optimal(n):
if n <= 1:
return n
a, b = 0, 1
for _ in range(2, n + 1):
a, b = b, a + b
return b
O(n) time, O(1) space. That is the full journey: O(2^n) naive → O(n) time / O(n) space with memo or table → O(n) time / O(1) space with the optimized table.
The two properties, made concrete
Before you try to apply DP to a new problem, you need to confirm both properties hold. Fibonacci makes them obvious, but they show up everywhere.
Overlapping subproblems means the recursion revisits the same inputs. If you drew the recursion tree and every node were unique, there would be nothing to cache — that is divide-and-conquer territory (merge sort, binary search). When nodes repeat, DP wins.
Optimal substructure means you can assemble the correct answer to the big problem from correct answers to smaller problems. For Fibonacci this is trivial: fib(n) = fib(n-1) + fib(n-2), no need to reconsider once you have those two. For "shortest path," it means: the shortest path from A to C through B is the shortest path A→B plus the shortest path B→C. This property is what lets you trust the cached answers — if the subproblem solutions were only locally optimal and could be invalidated later, the approach would break down.
Greedy algorithms also exploit optimal substructure but assume you never need to revisit a choice. DP is what you reach for when the greedy assumption does not hold and you genuinely need to consider all possibilities — but only solve each once.
The mechanical recipe
Whenever you face a new problem that smells like DP, work through these four steps in order. Do not skip to code.
-
Define the subproblem in English. For Fibonacci: "let
dp[i]be the i-th Fibonacci number." For "minimum coins to make amount X": "letdp[x]be the minimum number of coins to make amount x." Getting this wrong is the root cause of almost every failed DP attempt. -
Write the recurrence. How does
dp[i]depend on smaller values? For Fibonacci:dp[i] = dp[i-1] + dp[i-2]. This step is the insight — the rest is bookkeeping. -
Identify base cases. Where does the recursion bottom out?
dp[0] = 0,dp[1] = 1for Fibonacci. Without correct base cases you get infinite recursion or wrong answers propagated forward. -
Choose top-down or bottom-up. They are equivalent in power. Top-down (memoization) is usually easier to derive because it mirrors the recursive formulation. Bottom-up (tabulation) is easier to optimize for space and avoids Python's recursion limit (default 1000 frames).
flowchart LR
A["Define the subproblem"] --> B["Write the recurrence"]
B --> C["Base cases"]
C --> D{"Top-down or<br/>bottom-up?"}
D -->|"easier to write"| E["Memoization\n+ lru_cache"]
D -->|"space-efficient"| F["Tabulation\n(fill table)"]
style A fill:#7c5cff,color:#fff
style B fill:#7c5cff,color:#fff
style C fill:#7c5cff,color:#fff
style E fill:#00ff9d,color:#0a0a0f
style F fill:#00ff9d,color:#0a0a0f
What makes DP different from plain recursion
This is the conceptual jump that confuses people most. You already knew recursion. You already knew that a function can call itself. So what is new?
The new thing is the decision to reuse answers instead of recompute them, combined with the recognition that certain problems have a structure that makes reuse valid. Plain recursion explores a tree. DP collapses that tree into a DAG by merging identical nodes. The tree has exponential size; the DAG has polynomial size.
Here is the same comparison in terms you might remember from Big-O:
| Approach | Call tree nodes | Work per node | Total |
|---|---|---|---|
| Naive recursion | ~2^n | O(1) | O(2^n) |
| Memoized recursion | n (unique) | O(1) | O(n) |
| Bottom-up table | n iterations | O(1) | O(n) |
The exponential comes entirely from redundant recomputation. Eliminate the redundancy and you are left with linear work. That is not a trick — it is the point.
A second example: the staircase problem
Once you see the pattern in Fibonacci, you start spotting it everywhere. Consider this classic:
You are climbing a staircase with
nsteps. You can take 1 or 2 steps at a time. How many distinct ways can you reach the top?
Define the subproblem: ways(i) = number of distinct ways to reach step i.
To reach step i, you either came from step i-1 (took 1 step) or from step i-2 (took 2 steps). So:
ways(i) = ways(i-1) + ways(i-2)
That is Fibonacci with different base cases (ways(1) = 1, ways(2) = 2). The recurrence is identical. This is not a coincidence — many DP problems reduce to the same underlying recurrence under different labels. Once you internalize the shape, you recognize it fast.
def climb_stairs(n):
if n <= 2:
return n
a, b = 1, 2
for _ in range(3, n + 1):
a, b = b, a + b
return b
Pitfalls people actually hit
Off-by-one in the table size. If your subproblem is dp[0] through dp[n], you need an array of size n + 1. dp = [0] * n and then accessing dp[n] is an index error waiting to happen. Annoying, common, easy to miss.
Wrong subproblem definition. If dp[i] does not mean a consistent, well-defined thing, your recurrence will produce garbage. Write the definition in a comment before you write any code. "dp[i] is the ___" — fill in that blank precisely.
Forgetting that memoization has a recursion depth limit. Python defaults to 1000 recursive calls. For fib(2000) with top-down memoization, you will hit RecursionError. Either use bottom-up tabulation, or call sys.setrecursionlimit() — though the latter is a band-aid.
Confusing memoization with caching in general. Memoization is valid only when the function is pure — same inputs always produce same outputs, no side effects. If your recursive function depends on global mutable state, a cache will serve stale answers. Fibonacci is pure. A function that reads from a database is not.
Jumping straight to 2-D DP before nailing 1-D. A huge fraction of hard DP interview questions use a 2-D table — dp[i][j] over two sequences, or a grid. But every 2-D DP problem is just a 1-D DP problem applied twice. Get the 1-D pattern so automatic it feels boring, then the 2-D extension is straightforward.
When DP is not the answer
DP solves problems with overlapping subproblems and optimal substructure. But not every problem has both.
- If subproblems do not overlap, divide and conquer (merge sort, binary search) is faster and simpler — no cache overhead, no table to fill.
- If you can always make a locally optimal choice without backtracking, a greedy algorithm is faster and uses less space. Dijkstra's algorithm is technically greedily optimal given a non-negative-weight constraint; without that constraint you need Bellman-Ford (which is DP).
- If the state space is too large to enumerate — say,
dp[amount]for amounts up to 10^18 — you need matrix exponentiation, number-theoretic tricks, or a completely different angle.
The honest heuristic: if the problem says "count the ways," "minimum cost," "maximum value," or "is it possible," and the constraints fit in memory (n ≤ 10^4 or so for 1-D, n × m ≤ 10^6 or so for 2-D), think DP first.
Where to go from here
This lesson built the foundation: what DP is, why it works, and the mechanical recipe. Fibonacci was the vehicle, not the destination.
The Dynamic Programming technique guide takes you through the full taxonomy — knapsack, LCS, LIS, edit distance, interval DP, tree DP — with worked examples at each difficulty level. That is where the patterns that actually appear in interviews live.
If the recursion in this lesson felt shaky, revisit Recursion first — DP memoization is recursion with memory, and if the base case / recursive case mental model is not solid, the DP layer will feel floating. Likewise, if the O(2^n) → O(n) jump felt hand-wavy, Big-O Notation has the accounting that makes it rigorous.
The tell that you have really internalized DP: when you see a new problem, you reach for pencil and paper, write "dp[i] = ___", and the recurrence falls out before you open your editor. That is the skill. Fibonacci is just the key that opens the door.
Frequently asked questions
▸What is dynamic programming in simple terms?
Dynamic programming is a technique for solving problems by breaking them into smaller subproblems, solving each one once, and storing the result so you never solve the same subproblem twice. The two properties that tell you DP applies are overlapping subproblems (the same smaller problem comes up repeatedly) and optimal substructure (the best solution to the big problem is built from best solutions to smaller ones).
▸What is the difference between memoization and tabulation in DP?
Memoization (top-down) is recursive — you write the natural recursion, add a cache, and let the recursion tree guide which subproblems get solved. Tabulation (bottom-up) is iterative — you fill a table starting from the smallest subproblems and build up. Both give the same asymptotic complexity; memoization is usually easier to derive, while tabulation avoids recursion stack overhead and is easier to optimize for space.
▸How do I know if a problem can be solved with dynamic programming?
Ask two questions: (1) Can I express the answer to a big instance in terms of answers to smaller instances? (2) Do those smaller instances repeat? If yes to both, DP applies. Common tell-tale problem shapes include "count the number of ways," "find the minimum/maximum cost," "is it possible to achieve X," and anything that asks you to optimize over sequences, grids, or subsets.
▸What is the time complexity of dynamic programming?
It depends on the problem, but the general rule is: O(number of distinct subproblems × work per subproblem). For Fibonacci, that is O(n) subproblems × O(1) work each = O(n). Many 2-D DP problems on an n×m grid are O(n·m). The key insight is that memoization turns an exponential recursion tree into a DAG where each node is computed once.
▸What is the difference between dynamic programming and divide and conquer?
Both break problems into subproblems. Divide and conquer (like merge sort) splits into independent, non-overlapping subproblems — you never solve the same piece twice. Dynamic programming applies when those subproblems overlap — when the same sub-instance is needed by multiple branches of the recursion. Caching the answer is what makes DP efficient; without overlap, there is nothing to cache.