Tries (Prefix Trees)
The data structure powering autocomplete, spell-check, and IP routing — a tree where each edge is a character and every path from the root spells a prefix. Insert, search, and prefix queries all run in O(L) on word length alone, completely independent of how many words you store.
The mental model: every shared prefix is a shared node
Here is the thing that clicks it into place. When you store "cat" and "car" in a hash table, you get two independent entries. The shared prefix "ca" is duplicated in both strings — stored twice, checked independently. A trie refuses to do that. It insists: if two words start with the same character, they share the same first node. If they share the first two characters, they share the same first two nodes.
The result is that the trie's structure is the prefix structure of your word list. That is not a side effect — it is the whole design. "Dog" branches off on its own from the root because it shares no prefix with "cat" or "car." "Car" and "cat" stay together until the fourth character, where they diverge into "r" and "t" nodes.
flowchart TD
root(( root ))
root -->|c| c((c))
root -->|d| d((d))
c -->|a| ca((a))
ca -->|t| cat((t))
ca -->|r| car((r))
d -->|o| do((o))
do -->|g| dog((g))
subgraph "words stored"
W1["cat ✓"]
W2["car ✓"]
W3["dog ✓"]
end
cat -.-> W1
car -.-> W2
dog -.-> W3
style root fill:#7c5cff,color:#fff
style ca fill:#22d3ee,color:#0a0a0f
style cat fill:#00ff9d,color:#0a0a0f
style car fill:#00ff9d,color:#0a0a0f
style dog fill:#00ff9d,color:#0a0a0f
Now add "card" to this trie. You walk "c" → "a" → "r" — all three already exist, no new nodes — and attach a new "d" child under "r". You also need to mark "r" as is_end = True (it already is for "car") and add is_end = True on the new "d" node. Total cost: one new node. If you later add "care," "carp," and "careful," you pay one node per new character.
That's why insert is O(L): in the worst case you create L new nodes. In the best case (a prefix already fully exists), you do L lookups and set one flag. Either way: O(L).
Node structure: array vs. dict
A trie node needs two things: a way to find child nodes by character, and a flag for whether a word ends here. Two common choices:
Array of 26 children (for lowercase English):
class TrieNode:
def __init__(self):
self.children = [None] * 26 # index 0 = 'a', 25 = 'z'
self.is_end = False
def index(self, ch):
return ord(ch) - ord('a')
Lookup is O(1) — index into the array directly. But every node allocates 26 pointers regardless of how many children it actually has. For a deep, narrow trie this burns memory.
Hash map per node:
class TrieNode:
def __init__(self):
self.children = {} # char → TrieNode
self.is_end = False
Uses memory proportional to the actual number of children. Slightly slower in practice (hash computation vs. integer index), but works for large or variable alphabets — URLs, DNA sequences, Unicode strings. In Python interviews, the dict version is what most people write and it is fine.
The complexity is identical either way. Pick the array version if you want the interviewer to know you thought about cache performance; pick the dict version if you want legible code you can write without bugs under pressure.
The three core operations
Insert
Walk character by character. At each step, look up the child for the current character. If it doesn't exist, create it. After consuming all characters, set is_end = True on the final node.
def insert(self, word: str) -> None:
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
One iteration per character. O(L).
Search
Walk character by character. If any character is missing from the current node's children, the word isn't in the trie — return False immediately. If you consume all characters and land on a node with is_end = True, the word exists.
def search(self, word: str) -> bool:
node = self.root
for ch in word:
if ch not in node.children:
return False
node = node.children[ch]
return node.is_end # path exists, but is it a complete word?
That last check is critical. The trie might contain "card" without containing "car" — the path "c → a → r" exists, but unless "r" is marked as an end node, "car" is not a word in this trie. Forgetting is_end is the single most common bug.
startsWith (prefix check)
Identical to search, minus the is_end check. You just want to know if any stored word starts with the given prefix.
def starts_with(self, prefix: str) -> bool:
node = self.root
for ch in prefix:
if ch not in node.children:
return False
node = node.children[ch]
return True # the prefix exists; don't care if it's a complete word
O(L) on prefix length. A hash table cannot answer this without scanning every stored key. The trie answers it by walking down a single path.
The full implementation
Here's a clean implementation of LeetCode's classic Implement Trie problem (208):
class TrieNode:
def __init__(self):
self.children: dict[str, 'TrieNode'] = {}
self.is_end: bool = False
class Trie:
def __init__(self):
self.root = TrieNode()
def insert(self, word: str) -> None:
node = self.root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
def search(self, word: str) -> bool:
node = self._find_node(word)
return node is not None and node.is_end
def starts_with(self, prefix: str) -> bool:
return self._find_node(prefix) is not None
def _find_node(self, s: str) -> 'TrieNode | None':
"""Walk the trie following characters in s. Return the final node or None."""
node = self.root
for ch in s:
if ch not in node.children:
return None
node = node.children[ch]
return node
Factor out _find_node once you realize search and starts_with differ by exactly one line. Less code to mistype under interview pressure.
Autocomplete: the pattern that makes tries famous
Given a trie of words and a prefix typed by a user, return all words in the trie that start with that prefix. This is what Google, the macOS Spotlight, your IDE, and your phone keyboard all do (conceptually — at scale they use compressed variants, but the logic is this).
def autocomplete(trie: Trie, prefix: str) -> list[str]:
# Step 1: walk to the prefix node
node = trie._find_node(prefix)
if node is None:
return [] # no word in the trie starts with this prefix
# Step 2: DFS from the prefix node, collecting complete words
results = []
_dfs(node, prefix, results)
return results
def _dfs(node: TrieNode, current: str, results: list[str]) -> None:
if node.is_end:
results.append(current) # found a complete word
for ch, child in node.children.items():
_dfs(child, current + ch, results)
The total cost is O(L) to reach the prefix node plus O(total characters in matching words) for the DFS. If you have a million words but only 10 start with "xyz," you pay O(3 + characters in those 10 words). The rest of the trie is never touched.
This is the fundamental advantage over a sorted list of words. A sorted list does binary search in O(log N + K) where N is the dictionary size and K is results — the log N factor depends on how big your dictionary is. The trie's O(L + K) does not. The dictionary could grow from 10,000 words to 10 billion words without changing autocomplete cost.
Worked problem: word search II
Given a grid of characters and a list of words, find all words that exist in the grid (LeetCode 212). A cell can be used once per word; valid paths go up/down/left/right.
The naive approach: for every word, run DFS on the grid trying to match it. If there are W words and the grid is R×C, this is O(W × R × C × 4^L) — catastrophically slow when W is large.
The trie approach: build a trie from all W words, then run one DFS over the grid. At each position, you're simultaneously matching all W words at once.
def find_words(board: list[list[str]], words: list[str]) -> list[str]:
trie = Trie()
for word in words:
trie.insert(word)
rows, cols = len(board), len(board[0])
found = set()
def dfs(node: TrieNode, r: int, c: int, path: str) -> None:
ch = board[r][c]
if ch not in node.children:
return # this branch can't match any remaining word
next_node = node.children[ch]
path += ch
if next_node.is_end:
found.add(path)
# don't return: a longer word might still match below
board[r][c] = '#' # mark visited
for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
nr, nc = r + dr, c + dc
if 0 <= nr < rows and 0 <= nc < cols and board[nr][nc] != '#':
dfs(next_node, nr, nc, path)
board[r][c] = ch # restore
for r in range(rows):
for c in range(cols):
dfs(trie.root, r, c, '')
return list(found)
The trie prunes the search early — if the current path doesn't match any prefix in the dictionary, you backtrack immediately. One DFS pass handles all W words simultaneously because they share prefix nodes in the trie.
Complexity and space trade-offs
| Operation | Trie | Hash set |
|---|---|---|
| Insert | O(L) | O(L) |
| Exact search | O(L) | O(L) average |
| Prefix check | O(L) | O(N · L) — scan all keys |
| Autocomplete | O(L + K) | O(N · L) — scan all keys |
| Memory | O(total characters) | O(total characters) |
For exact lookups, the trie and hash set are asymptotically equivalent — the hash set often wins on constant factors because a good hash function beats walking a tree. The trie dominates the moment you need anything prefix-related.
Space warning. A 26-child-array trie node on a 64-bit system costs 26 × 8 = 208 bytes per node, plus overhead. A trie holding 50,000 English words might create 150,000+ nodes — that is over 30 MB for the naive representation. Radix trees (compressed tries that merge single-child chains into one edge) cut this dramatically and are what real systems use. For interviews, the naive node structure is always fine.
Pitfalls people actually hit
Forgetting is_end on search. You check whether the path exists in the trie, declare success, and return True — but the trie contains "cards," not "car." The path for "car" exists as a prefix, but there's no end marker. Search should return False. This bug is so common that it's practically what the problem tests.
Not restoring visited cells in grid DFS. In word search problems you mark cells as visited before recursing and must restore them after. Forgetting this means a single path through the grid permanently corrupts the board for later searches.
Building a trie when a hash set would do. If you only need "is this word in the dictionary?" and you'll never ask prefix questions, a set is simpler and faster. Don't reach for a trie out of habit.
Assuming all characters are lowercase letters. Many problems guarantee this; many do not. A URL trie might include ., /, -. An IP-prefix trie works bit-by-bit, with an alphabet of size 2. Know your alphabet before you hard-code [None] * 26.
Not pruning dead branches after deletion. If you implement delete and don't walk back up removing now-childless nodes, deleted words leave zombie paths in the trie. The paths don't match complete words (their is_end is cleared), but they waste memory. For interviews, skip deletion unless asked.
When NOT to use a trie
| Situation | Better option |
|---|---|
| Only exact-match lookups, no prefix queries | Hash table — simpler, faster constant factors |
| Ordered key traversal or range queries on strings | Sorted array + binary search, or a binary tree |
| Sparse alphabet or large character sets | Hash-map per node or a compressed trie |
| Graph-shaped relationships between words | A proper graph |
| You control neither insert nor lookup patterns | Profile first; hash set almost always wins for pure lookup |
The trie earns its place when you need to do something prefix-shaped with strings: autocomplete, spell-check suggestions, IP longest-prefix match, counting words with a given prefix, or finding the longest common prefix in a set of strings. If the problem is asking you to traverse a string character-by-character while consulting a dictionary, a trie is almost certainly the intended structure.
One more concrete tell: if you catch yourself writing code that iterates over all dictionary words to check whether any starts with the user's input, stop and build a trie. You have just invented O(N·L) where O(L) exists.
The tell in interview problems
The trie signal is this: you have a collection of strings, and queries arrive that care about prefixes or shared structure. Not "find the string," but "find all strings starting with this," or "how many strings share this prefix," or "what is the longest prefix of this string that exists in my set."
Once you see it, the approach is mechanical:
- Build the trie — one insert call per string, O(L) each.
- For each query, walk the trie to the end of the prefix — O(L).
- From there, DFS or count depending on what you need.
The code is about 30 lines of the same pattern repeated. Nail the _find_node helper, be careful with is_end, and you'll handle the entire family of problems cleanly.
Frequently asked questions
▸What is a trie and why is it called a prefix tree?
A trie is a tree where each node represents a character, and the path from the root to any node spells out a string prefix. The name "trie" comes from re-TRIE-val. It is called a prefix tree because its structure naturally groups all words that share a common prefix under the same subtree — searching for words starting with "ca" means you walk to the "c" node, then the "a" child, and every path downward from there is a word beginning with "ca".
▸Is a trie faster than a hash map for string lookup?
For exact lookups, a hash map is often comparable or faster in practice — O(L) but with a lower constant, no tree traversal overhead. The trie wins decisively when you need prefix queries: "does any stored word start with this prefix?" A hash map cannot answer that without scanning all keys. The trie answers in O(L) on prefix length. Tries also share common prefixes in memory, which matters when storing large dictionaries with heavy overlap like "inter-", "un-", "pre-".
▸How much memory does a trie use?
A naive trie node with an array of 26 children (one per lowercase letter) uses 26 pointers per node. For a dictionary of 50,000 English words you might have 150,000 nodes — the exact count depends on how many prefixes are shared. Words with common prefixes share nodes, which saves memory. Hash-map nodes (one dict per node) use less memory when the branching factor is low and more when it is high. In practice, compressed tries (Patricia tries, radix trees) are used in production systems to cut memory significantly.
▸What is the difference between a trie and a binary search tree?
A binary search tree compares whole keys at each node — you go left or right based on whether the search key is smaller or larger. A trie never compares whole keys; it consumes one character per level. This means trie depth equals word length, not log(n) of the dictionary size. A trie with one million words is no deeper than the longest word in it (usually around 25 characters). A BST over the same million words is about 20 levels deep — comparable here, but the trie has the decisive edge for prefix queries.
▸When should I use a trie instead of a set of strings?
Use a trie when you need prefix queries, autocomplete, or the longest-common-prefix among a set of words. A plain hash set answers "is this exact word in my dictionary?" in O(L) but cannot answer "give me all words starting with XYZ" without a full scan. The trie answers both in O(L) on the prefix length. If you only ever need exact membership checks and never care about prefixes or enumeration, a hash set is simpler and often faster.
You may also like
Union-Find (Disjoint Set Union)
The near-magic data structure for connectivity queries — are these two nodes in the same component? Nearly O(1) per operation with union by rank and path compression. Kruskal's MST, redundant connections, account merging, and more.
Strings
Strings look like simple text, but they hide a brutal trap: naive concatenation in a loop is O(n²). Learn why, how encoding actually works, and the handful of patterns — sliding window, two pointers, hashing — that solve 80% of string interview problems.
Segment Trees
Segment trees answer range queries AND handle point or range updates in O(log n) — the structure to reach for when prefix sums aren't enough and you need both reads and writes at scale.