◆◆Intermediateasked at Googleasked at Amazonasked at Meta

Tries (Prefix Trees)

The data structure powering autocomplete, spell-check, and IP routing — a tree where each edge is a character and every path from the root spells a prefix. Insert, search, and prefix queries all run in O(L) on word length alone, completely independent of how many words you store.

11 min read2026-01-20Ironclad Academy

#tries #trees #strings #data-structures

Complexity cheat sheet

Operation

Average

Worst

Insert wordL = word length; create at most L nodes

O(L)

Search wordwalk L edges; O(1) per character

O(L)

startsWith (prefix check)same walk, no end-of-word check

O(L)

Delete wordwalk + prune dead branches on the way back

O(L)

Autocomplete (enumerate all words under prefix)L to reach prefix node, then DFS over K results; N = total stored characters

O(L + K)

O(L + N)

// DEPTH

the full breakdown — requirements, capacity, evolution, trade-offs

The mental model: every shared prefix is a shared node

Here is the thing that clicks it into place. When you store "cat" and "car" in a hash table, you get two independent entries. The shared prefix "ca" is duplicated in both strings — stored twice, checked independently. A trie refuses to do that. It insists: if two words start with the same character, they share the same first node. If they share the first two characters, they share the same first two nodes.

The result is that the trie's structure is the prefix structure of your word list. That is not a side effect — it is the whole design. "Dog" branches off on its own from the root because it shares no prefix with "cat" or "car." "Car" and "cat" stay together until the fourth character, where they diverge into "r" and "t" nodes.

flowchart TD
    root(( root ))
    root -->|c| c((c))
    root -->|d| d((d))
    c -->|a| ca((a))
    ca -->|t| cat((t))
    ca -->|r| car((r))
    d -->|o| do((o))
    do -->|g| dog((g))
    subgraph "words stored"
      W1["cat ✓"]
      W2["car ✓"]
      W3["dog ✓"]
    end
    cat -.-> W1
    car -.-> W2
    dog -.-> W3
    style root fill:#7c5cff,color:#fff
    style ca fill:#22d3ee,color:#0a0a0f
    style cat fill:#00ff9d,color:#0a0a0f
    style car fill:#00ff9d,color:#0a0a0f
    style dog fill:#00ff9d,color:#0a0a0f

Now add "card" to this trie. You walk "c" → "a" → "r" — all three already exist, no new nodes — and attach a new "d" child under "r". You also need to mark "r" as is_end = True (it already is for "car") and add is_end = True on the new "d" node. Total cost: one new node. If you later add "care," "carp," and "careful," you pay one node per new character.

That's why insert is O(L): in the worst case you create L new nodes. In the best case (a prefix already fully exists), you do L lookups and set one flag. Either way: O(L).

Node structure: array vs. dict

A trie node needs two things: a way to find child nodes by character, and a flag for whether a word ends here. Two common choices:

Array of 26 children (for lowercase English):

class TrieNode:
    def __init__(self):
        self.children = [None] * 26   # index 0 = 'a', 25 = 'z'
        self.is_end = False

    def index(self, ch):
        return ord(ch) - ord('a')

Lookup is O(1) — index into the array directly. But every node allocates 26 pointers regardless of how many children it actually has. For a deep, narrow trie this burns memory.

Hash map per node:

class TrieNode:
    def __init__(self):
        self.children = {}     # char → TrieNode
        self.is_end = False

Uses memory proportional to the actual number of children. Slightly slower in practice (hash computation vs. integer index), but works for large or variable alphabets — URLs, DNA sequences, Unicode strings. In Python interviews, the dict version is what most people write and it is fine.

The complexity is identical either way. Pick the array version if you want the interviewer to know you thought about cache performance; pick the dict version if you want legible code you can write without bugs under pressure.

The three core operations

Insert

Walk character by character. At each step, look up the child for the current character. If it doesn't exist, create it. After consuming all characters, set is_end = True on the final node.

def insert(self, word: str) -> None:
    node = self.root
    for ch in word:
        if ch not in node.children:
            node.children[ch] = TrieNode()
        node = node.children[ch]
    node.is_end = True

One iteration per character. O(L).

Search

Walk character by character. If any character is missing from the current node's children, the word isn't in the trie — return False immediately. If you consume all characters and land on a node with is_end = True, the word exists.

def search(self, word: str) -> bool:
    node = self.root
    for ch in word:
        if ch not in node.children:
            return False
        node = node.children[ch]
    return node.is_end     # path exists, but is it a complete word?

That last check is critical. The trie might contain "card" without containing "car" — the path "c → a → r" exists, but unless "r" is marked as an end node, "car" is not a word in this trie. Forgetting is_end is the single most common bug.

startsWith (prefix check)

Identical to search, minus the is_end check. You just want to know if any stored word starts with the given prefix.

def starts_with(self, prefix: str) -> bool:
    node = self.root
    for ch in prefix:
        if ch not in node.children:
            return False
        node = node.children[ch]
    return True    # the prefix exists; don't care if it's a complete word

O(L) on prefix length. A hash table cannot answer this without scanning every stored key. The trie answers it by walking down a single path.

The full implementation

Here's a clean implementation of LeetCode's classic Implement Trie problem (208):

class TrieNode:
    def __init__(self):
        self.children: dict[str, 'TrieNode'] = {}
        self.is_end: bool = False


class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    def search(self, word: str) -> bool:
        node = self._find_node(word)
        return node is not None and node.is_end

    def starts_with(self, prefix: str) -> bool:
        return self._find_node(prefix) is not None

    def _find_node(self, s: str) -> 'TrieNode | None':
        """Walk the trie following characters in s. Return the final node or None."""
        node = self.root
        for ch in s:
            if ch not in node.children:
                return None
            node = node.children[ch]
        return node

Factor out _find_node once you realize search and starts_with differ by exactly one line. Less code to mistype under interview pressure.

Autocomplete: the pattern that makes tries famous

Given a trie of words and a prefix typed by a user, return all words in the trie that start with that prefix. This is what Google, the macOS Spotlight, your IDE, and your phone keyboard all do (conceptually — at scale they use compressed variants, but the logic is this).

def autocomplete(trie: Trie, prefix: str) -> list[str]:
    # Step 1: walk to the prefix node
    node = trie._find_node(prefix)
    if node is None:
        return []          # no word in the trie starts with this prefix

    # Step 2: DFS from the prefix node, collecting complete words
    results = []
    _dfs(node, prefix, results)
    return results


def _dfs(node: TrieNode, current: str, results: list[str]) -> None:
    if node.is_end:
        results.append(current)           # found a complete word
    for ch, child in node.children.items():
        _dfs(child, current + ch, results)

The total cost is O(L) to reach the prefix node plus O(total characters in matching words) for the DFS. If you have a million words but only 10 start with "xyz," you pay O(3 + characters in those 10 words). The rest of the trie is never touched.

This is the fundamental advantage over a sorted list of words. A sorted list does binary search in O(log N + K) where N is the dictionary size and K is results — the log N factor depends on how big your dictionary is. The trie's O(L + K) does not. The dictionary could grow from 10,000 words to 10 billion words without changing autocomplete cost.

Worked problem: word search II

Given a grid of characters and a list of words, find all words that exist in the grid (LeetCode 212). A cell can be used once per word; valid paths go up/down/left/right.

The naive approach: for every word, run DFS on the grid trying to match it. If there are W words and the grid is R×C, this is O(W × R × C × 4^L) — catastrophically slow when W is large.

The trie approach: build a trie from all W words, then run one DFS over the grid. At each position, you're simultaneously matching all W words at once.

def find_words(board: list[list[str]], words: list[str]) -> list[str]:
    trie = Trie()
    for word in words:
        trie.insert(word)

    rows, cols = len(board), len(board[0])
    found = set()

    def dfs(node: TrieNode, r: int, c: int, path: str) -> None:
        ch = board[r][c]
        if ch not in node.children:
            return          # this branch can't match any remaining word
        next_node = node.children[ch]
        path += ch
        if next_node.is_end:
            found.add(path)
            # don't return: a longer word might still match below

        board[r][c] = '#'   # mark visited
        for dr, dc in [(-1,0),(1,0),(0,-1),(0,1)]:
            nr, nc = r + dr, c + dc
            if 0 <= nr < rows and 0 <= nc < cols and board[nr][nc] != '#':
                dfs(next_node, nr, nc, path)
        board[r][c] = ch    # restore

    for r in range(rows):
        for c in range(cols):
            dfs(trie.root, r, c, '')

    return list(found)

The trie prunes the search early — if the current path doesn't match any prefix in the dictionary, you backtrack immediately. One DFS pass handles all W words simultaneously because they share prefix nodes in the trie.

Complexity and space trade-offs

Operation	Trie	Hash set
Insert	O(L)	O(L)
Exact search	O(L)	O(L) average
Prefix check	O(L)	O(N · L) — scan all keys
Autocomplete	O(L + K)	O(N · L) — scan all keys
Memory	O(total characters)	O(total characters)

For exact lookups, the trie and hash set are asymptotically equivalent — the hash set often wins on constant factors because a good hash function beats walking a tree. The trie dominates the moment you need anything prefix-related.

Space warning. A 26-child-array trie node on a 64-bit system costs 26 × 8 = 208 bytes per node, plus overhead. A trie holding 50,000 English words might create 150,000+ nodes — that is over 30 MB for the naive representation. Radix trees (compressed tries that merge single-child chains into one edge) cut this dramatically and are what real systems use. For interviews, the naive node structure is always fine.

Pitfalls people actually hit

Forgetting is_end on search. You check whether the path exists in the trie, declare success, and return True — but the trie contains "cards," not "car." The path for "car" exists as a prefix, but there's no end marker. Search should return False. This bug is so common that it's practically what the problem tests.

Not restoring visited cells in grid DFS. In word search problems you mark cells as visited before recursing and must restore them after. Forgetting this means a single path through the grid permanently corrupts the board for later searches.

Building a trie when a hash set would do. If you only need "is this word in the dictionary?" and you'll never ask prefix questions, a set is simpler and faster. Don't reach for a trie out of habit.

Assuming all characters are lowercase letters. Many problems guarantee this; many do not. A URL trie might include ., /, -. An IP-prefix trie works bit-by-bit, with an alphabet of size 2. Know your alphabet before you hard-code [None] * 26.

Not pruning dead branches after deletion. If you implement delete and don't walk back up removing now-childless nodes, deleted words leave zombie paths in the trie. The paths don't match complete words (their is_end is cleared), but they waste memory. For interviews, skip deletion unless asked.

When NOT to use a trie

Situation	Better option
Only exact-match lookups, no prefix queries	Hash table — simpler, faster constant factors
Ordered key traversal or range queries on strings	Sorted array + binary search, or a binary tree
Sparse alphabet or large character sets	Hash-map per node or a compressed trie
Graph-shaped relationships between words	A proper graph
You control neither insert nor lookup patterns	Profile first; hash set almost always wins for pure lookup

The trie earns its place when you need to do something prefix-shaped with strings: autocomplete, spell-check suggestions, IP longest-prefix match, counting words with a given prefix, or finding the longest common prefix in a set of strings. If the problem is asking you to traverse a string character-by-character while consulting a dictionary, a trie is almost certainly the intended structure.

One more concrete tell: if you catch yourself writing code that iterates over all dictionary words to check whether any starts with the user's input, stop and build a trie. You have just invented O(N·L) where O(L) exists.

The tell in interview problems

The trie signal is this: you have a collection of strings, and queries arrive that care about prefixes or shared structure. Not "find the string," but "find all strings starting with this," or "how many strings share this prefix," or "what is the longest prefix of this string that exists in my set."

Once you see it, the approach is mechanical:

Build the trie — one insert call per string, O(L) each.
For each query, walk the trie to the end of the prefix — O(L).
From there, DFS or count depending on what you need.

The code is about 30 lines of the same pattern repeated. Nail the _find_node helper, be careful with is_end, and you'll handle the entire family of problems cleanly.

// KEY TAKEAWAYS

▸Every path from the root to an end-of-word node spells a complete word; every shared prefix is a shared node — that is the whole data model.
▸Insert, search, and startsWith are all O(L) on word length — the total number of stored words does not affect any of these operations.
▸A trie with a 26-element array per node is fast but memory-hungry; a dict-per-node is leaner for sparse alphabets and equally fast asymptotically.
▸The signature pattern: autocomplete, "count words with this prefix," longest common prefix, word search in a grid — anything where you process strings character-by-character with shared structure.
▸When you only need exact-match lookups, a hash set beats a trie on simplicity and constant factors; the trie earns its place when prefix structure matters.

// FAQ

Frequently asked questions

▸What is a trie and why is it called a prefix tree?

A trie is a tree where each node represents a character, and the path from the root to any node spells out a string prefix. The name "trie" comes from re-TRIE-val. It is called a prefix tree because its structure naturally groups all words that share a common prefix under the same subtree — searching for words starting with "ca" means you walk to the "c" node, then the "a" child, and every path downward from there is a word beginning with "ca".

▸Is a trie faster than a hash map for string lookup?

For exact lookups, a hash map is often comparable or faster in practice — O(L) but with a lower constant, no tree traversal overhead. The trie wins decisively when you need prefix queries: "does any stored word start with this prefix?" A hash map cannot answer that without scanning all keys. The trie answers in O(L) on prefix length. Tries also share common prefixes in memory, which matters when storing large dictionaries with heavy overlap like "inter-", "un-", "pre-".

▸How much memory does a trie use?

A naive trie node with an array of 26 children (one per lowercase letter) uses 26 pointers per node. For a dictionary of 50,000 English words you might have 150,000 nodes — the exact count depends on how many prefixes are shared. Words with common prefixes share nodes, which saves memory. Hash-map nodes (one dict per node) use less memory when the branching factor is low and more when it is high. In practice, compressed tries (Patricia tries, radix trees) are used in production systems to cut memory significantly.

▸What is the difference between a trie and a binary search tree?

A binary search tree compares whole keys at each node — you go left or right based on whether the search key is smaller or larger. A trie never compares whole keys; it consumes one character per level. This means trie depth equals word length, not log(n) of the dictionary size. A trie with one million words is no deeper than the longest word in it (usually around 25 characters). A BST over the same million words is about 20 levels deep — comparable here, but the trie has the decisive edge for prefix queries.

▸When should I use a trie instead of a set of strings?

Use a trie when you need prefix queries, autocomplete, or the longest-common-prefix among a set of words. A plain hash set answers "is this exact word in my dictionary?" in O(L) but cannot answer "give me all words starting with XYZ" without a full scan. The trie answers both in O(L) on the prefix length. If you only ever need exact membership checks and never care about prefixes or enumeration, a hash set is simpler and often faster.

← previous

Union-Find (Disjoint Set Union)

Strings

// RELATED

Tries (Prefix Trees)

The mental model: every shared prefix is a shared node

Node structure: array vs. dict

The three core operations

Insert

Search

startsWith (prefix check)

The full implementation

Autocomplete: the pattern that makes tries famous

Worked problem: word search II

Complexity and space trade-offs

Pitfalls people actually hit

When NOT to use a trie

The tell in interview problems

Frequently asked questions

You may also like

Union-Find (Disjoint Set Union)

Strings

Segment Trees