~/data structures/balanced-bst
◆◆◆Advancedasked at Googleasked at Amazonasked at Microsoft

Balanced BSTs (AVL & Red-Black Trees)

Self-balancing trees fix the degenerate-BST problem by keeping height O(log n) guaranteed — which is why Java's TreeMap, C++'s std::map, and every ordered set in the standard library runs on one under the hood.

Complexity cheat sheet
Operation
Average
Worst
Searchheight bounded by rotation invariants
O(log n)
O(log n)
Insertone traversal down + O(1) or O(log n) rotations
O(log n)
O(log n)
Deletetrickier than insert; same asymptotic cost
O(log n)
O(log n)
Min / Maxleftmost / rightmost node
O(log n)
O(log n)
Predecessor / Successor
O(log n)
O(log n)
Range query [lo, hi]k = number of results
O(log n + k)
O(log n + k)
Floor / Ceil
O(log n)
O(log n)
// DEPTH
the full breakdown — requirements, capacity, evolution, trade-offs

The problem: sorted input kills a plain BST

Here is a scenario you have hit without knowing it. You are indexing database rows by auto-incremented ID. You insert them in order: 1, 2, 3, 4, 5, 6, 7. In a plain binary search tree, each new key is larger than everything already in the tree, so it always lands as the rightmost leaf:

flowchart TD
    A["1"] --> B[" "]
    A --> C["2"]
    C --> D[" "]
    C --> E["3"]
    E --> F[" "]
    E --> G["4"]
    G --> H[" "]
    G --> I["5"]
    style A fill:#7c5cff,color:#fff
    style C fill:#7c5cff,color:#fff
    style E fill:#7c5cff,color:#fff
    style G fill:#7c5cff,color:#fff
    style I fill:#7c5cff,color:#fff
    style B fill:transparent,stroke:transparent
    style D fill:transparent,stroke:transparent
    style F fill:#transparent,stroke:transparent
    style H fill:transparent,stroke:transparent

That is a linked list. Searching for 5 requires traversing every node from 1 down. Insert is O(n) too. You have lost everything the BST was supposed to give you.

The same degradation happens with reverse-sorted input (always goes left), or with any partially sorted sequence — which is precisely how log files, timestamps, and auto-incremented IDs arrive.

A self-balancing BST adds one rule: after every insert or delete, fix any height imbalance using rotations. The fix is local and cheap. The payoff is a worst-case height guarantee.

The fix: rotations

A rotation is the atomic unit of rebalancing. It is a local restructuring of three nodes that changes which node is the parent — without breaking the BST property. There are two kinds.

Right rotation

You have a node X with a heavy left subtree. Promote X's left child Y:

Before:          After:
    X               Y
   / \             / \
  Y   C    →     A   X
 / \                 / \
A   B               B   C

B moves from being Y's right child to being X's left child. Every value in B is still greater than Y and less than X — the BST ordering is preserved exactly. The only things that changed are three pointers. That is O(1).

Left rotation

The mirror image. Node X has a heavy right subtree; promote its right child:

Before:          After:
  X                 Y
 / \               / \
A   Y      →     X   C
   / \          / \
  B   C        A   B

Again, B moves — it was Y's left child, it becomes X's right child. Ordering is preserved.

flowchart LR
    subgraph before ["Before right rotation"]
        X1["X"] --> Y1["Y (left child)"]
        X1 --> C1["C"]
        Y1 --> A1["A"]
        Y1 --> B1["B"]
    end
    subgraph after ["After right rotation"]
        Y2["Y (new root)"] --> A2["A"]
        Y2 --> X2["X"]
        X2 --> B2["B"]
        X2 --> C2["C"]
    end
    before -.->|"rotate right on X"| after
    style X1 fill:#ff4d8d,color:#fff
    style X2 fill:#ff4d8d,color:#fff
    style Y1 fill:#7c5cff,color:#fff
    style Y2 fill:#7c5cff,color:#fff

A double rotation (left-right or right-left) handles the zig-zag cases — where the heavy child leans the opposite way from its parent. It is just two single rotations applied in sequence.

AVL trees: the strict balancer

An AVL tree attaches a balance factor to every node: the height of its right subtree minus the height of its left subtree. The invariant is simple:

Every node's balance factor must be -1, 0, or +1.

After each insert or delete, the tree walks back up to the root updating balance factors. The moment it finds a node with balance factor -2 or +2, it applies one or two rotations to fix it. The subtree shrinks back to its pre-insert height, so no further fixups are needed above it.

class AVLNode:
    def __init__(self, key):
        self.key = key
        self.left = self.right = None
        self.height = 1

def height(node):
    return node.height if node else 0

def balance_factor(node):
    return height(node.right) - height(node.left)

def update_height(node):
    node.height = 1 + max(height(node.left), height(node.right))

def rotate_right(y):
    x = y.left
    T2 = x.right
    x.right = y
    y.left = T2
    update_height(y)
    update_height(x)
    return x   # new root

def rotate_left(x):
    y = x.right
    T2 = y.left
    y.left = x
    x.right = T2
    update_height(x)
    update_height(y)
    return y   # new root

def rebalance(node):
    update_height(node)
    bf = balance_factor(node)

    # Left-heavy
    if bf == -2:
        if balance_factor(node.left) > 0:         # left-right case
            node.left = rotate_left(node.left)
        return rotate_right(node)

    # Right-heavy
    if bf == 2:
        if balance_factor(node.right) < 0:        # right-left case
            node.right = rotate_right(node.right)
        return rotate_left(node)

    return node   # already balanced

def insert(node, key):
    if not node:
        return AVLNode(key)
    if key < node.key:
        node.left = insert(node.left, key)
    elif key > node.key:
        node.right = insert(node.right, key)
    else:
        return node   # duplicate; ignore
    return rebalance(node)

The height invariant — balance factor bounded at ±1 — means AVL tree height is at most 1.44 log₂ n. That is only 44% taller than the theoretical minimum. In practice, lookups are very fast because the tree stays tight.

The cost: AVL trees do more rotations than red-black trees on writes. Benchmarks on write-heavy workloads usually show red-black winning there, while AVL wins on read-heavy ones.

Red-black trees: the pragmatic balancer

Red-black trees replace the per-node height calculation with a color bit (red or black) and enforce four invariants:

  1. Every node is red or black.
  2. The root is black.
  3. No red node has a red child. (No two reds in a row.)
  4. Every path from any node down to any null leaf passes through the same number of black nodes.

These invariants together imply the tree's height is at most 2 log₂(n+1). That is weaker than AVL's 1.44 bound — but it requires at most 2 rotations per insert (and 3 per delete), compared to AVL's potentially O(log n) propagating updates.

flowchart TD
    R["13 ⬛ (black)"]
    R --> B1["8 🔴 (red)"]
    R --> B2["17 ⬛ (black)"]
    B1 --> C1["1 ⬛ (black)"]
    B1 --> C2["11 ⬛ (black)"]
    B2 --> C3["15 🔴 (red)"]
    B2 --> C4["25 🔴 (red)"]
    C1 --> D1["null ⬛"]
    C1 --> D2["6 🔴 (red)"]
    style R fill:#1a1a2e,color:#fff,stroke:#888
    style B1 fill:#ff4d8d,color:#fff
    style B2 fill:#1a1a2e,color:#fff,stroke:#888
    style C1 fill:#1a1a2e,color:#fff,stroke:#888
    style C2 fill:#1a1a2e,color:#fff,stroke:#888
    style C3 fill:#ff4d8d,color:#fff
    style C4 fill:#ff4d8d,color:#fff
    style D1 fill:#1a1a2e,color:#888,stroke:#888
    style D2 fill:#ff4d8d,color:#fff

You do not implement red-black trees from scratch in production (or in most interviews). What matters is understanding why the standard library reaches for them: they keep writes fast by limiting rotation cascades, and the height bound is good enough for all practical log n guarantees.

The fix-up after insert involves case analysis — recoloring and at most two rotations depending on whether the uncle node is red or black. It is mechanical. If you need to implement one, write the four violation cases with exhaustive tests and trust the invariants.

Balanced BST vs hash map: when to choose which

This is the practical question that shows up in system design interviews and real code. The answer is almost always one of these two:

You need thisUse this
O(1) average lookup by exact keyHash map
All keys in sorted order (iteration)Balanced BST
"All values between 100 and 200"Balanced BST (range query)
"Largest key ≤ x" (floor)Balanced BST
"Smallest key ≥ x" (ceil)Balanced BST
Predecessor / successor of a keyBalanced BST
Rank queries ("5th smallest key")BST with size augmentation
You only need min or maxHeap

Hash maps destroy ordering. The moment you need to ask "what is the next key after 42?" or "give me everything from January to March", you cannot answer efficiently with a hash map — you would have to dump all keys and sort them. A balanced BST keeps keys sorted at all times and can answer any of those queries in O(log n) by exploiting tree structure.

The tradeoff: hash map lookups are O(1) average vs BST's O(log n). For n = 1 million, log₂ n ≈ 20. So you are paying about 20x in lookup cost for the ability to do ordered operations. Whether that is worth it depends entirely on your access pattern.

Core operations in practice

You rarely implement a balanced BST from scratch. You use the standard library and call the right methods.

Python has no built-in balanced BST, but sortedcontainers.SortedList / SortedDict from PyPI delivers O(log n) operations backed by a sorted list structure. For interviews, bisect.bisect_left on a sorted array gives you floor/ceil in O(log n) when you do not need dynamic inserts:

from sortedcontainers import SortedDict

# ordered map: keys always sorted
sd = SortedDict()
sd[5] = "five"
sd[3] = "three"
sd[8] = "eight"

# range query: keys between 3 and 7 inclusive
keys_in_range = sd.irange(3, 7)        # generator: 3, 5
print(list(keys_in_range))             # [3, 5]

# floor: largest key ≤ 6
idx = sd.bisect_right(6) - 1
floor_key = sd.keys()[idx] if idx >= 0 else None   # 5

# sorted iteration: always in key order
for k, v in sd.items():
    print(k, v)   # 3 three, 5 five, 8 eight

JavaTreeMap<K, V> is the workhorse. It is a red-black tree. Know these methods cold:

TreeMap<Integer, String> map = new TreeMap<>();
map.put(5, "five");
map.put(3, "three");
map.put(8, "eight");

// floor/ceil
map.floorKey(6);        // 5   — largest key ≤ 6
map.ceilingKey(6);      // 8   — smallest key ≥ 6

// predecessor / successor
map.lowerKey(5);        // 3   — strictly less than 5
map.higherKey(5);       // 8   — strictly greater than 5

// range query: subMap returns a live view
map.subMap(3, true, 7, true).forEach((k, v) ->
    System.out.println(k + " → " + v)
);  // 3 → three, 5 → five

// min / max in O(log n)
map.firstKey();   // 3
map.lastKey();    // 8

TreeSet<E> gives you the same ordered operations without values — use it when you need an ordered set.

C++std::map and std::set are red-black trees. lower_bound(x) returns an iterator to the first key ≥ x; upper_bound(x) returns the first key > x. Subtract one for floor semantics.

Two worked problems

Problem 1: find the closest BST value

Given a BST and a target float, find the value in the tree closest to the target.

The key insight: you do not need to search the whole tree. At each node, the closest value is either the node itself or somewhere deeper in the direction the target pulls you. Walk the tree exactly as you would for a search, tracking the closest seen so far.

def closest_value(root, target):
    closest = root.val
    node = root
    while node:
        if abs(node.val - target) < abs(closest - target):
            closest = node.val
        if target < node.val:
            node = node.left
        elif target > node.val:
            node = node.right
        else:
            return node.val   # exact match
    return closest

Time O(log n) on a balanced tree. O(n) on a degenerate one — which is why you need balance.

Problem 2: count of range sums (hard)

Given an integer array nums and bounds [lower, upper], count the number of range sums S(i, j) (where S(i, j) = nums[i] + ... + nums[j]) that lie in [lower, upper].

The prefix sum trick converts this to: for each prefix sum P[j], count how many previous prefix sums P[i] satisfy lower ≤ P[j] - P[i] ≤ upper, which means P[j] - upper ≤ P[i] ≤ P[j] - lower.

That is a range count query on a dynamic set of values — precisely what a balanced BST with augmented subtree sizes handles efficiently. Java's TreeMap does not give you rank queries directly, but a sorted list with binary search approximates it:

from sortedcontainers import SortedList

def count_range_sum(nums, lower, upper):
    from itertools import accumulate
    prefixes = SortedList([0])
    count = 0
    running = 0
    for num in nums:
        running += num
        # count prefix sums in [running - upper, running - lower]
        lo = prefixes.bisect_left(running - upper)
        hi = prefixes.bisect_right(running - lower)
        count += hi - lo
        prefixes.add(running)
    return count

Each bisect_left/bisect_right is O(log n) on the sorted structure. Total: O(n log n). The naive brute force is O(n²). The balanced structure is doing the heavy lifting.

Pitfalls

Using a plain BST and trusting "average case." Average case for a BST requires uniformly random insert order. Time-series data, auto-incremented IDs, alphabetically sorted names — all of these are partially sorted. They create skewed trees. Use a balanced implementation or do not use a BST at all.

Trying to implement AVL or red-black in an interview without a strong reason. If the interviewer asks you to implement a balanced BST from scratch, start with AVL (simpler rotation logic) and be explicit about the invariant you are maintaining. But in most interviews, the right answer is "I'd use TreeMap / SortedDict here" — reaching for the standard library is not cheating, it is what you would do in production.

Forgetting that deletion is harder than insertion. Deletion from an AVL tree has more cases than insertion because the node to delete might have two children (replace it with its in-order successor, then delete the successor from the right subtree). The fix-up can propagate all the way to the root. Write exhaustive tests before shipping any balanced BST implementation.

Using a balanced BST when you only need a heap. If your entire use case is "give me the current minimum" with inserts and extracts, a heap is simpler, faster in practice (array vs. node-based), and already in the standard library. The heap cannot do range queries or sorted iteration — but if you do not need those, do not pay for them.

Confusing balance with sort. The tree is always in BST order — left subtree < node < right subtree — but "balanced" refers to height, not value distribution. You can have a perfectly balanced tree with very uneven key spacing (e.g., keys 1, 1000, 1001 with 1 at the root).

When NOT to use a balanced BST

The balanced BST is the right tool when you need dynamic ordered data. It is the wrong tool when:

  • You only ever look up by exact key. Use a hash map — O(1) average is hard to beat, and you get no benefit from the ordering a BST maintains.
  • You only need the min or max element. Use a heap. It is faster in practice (flat array, no node overhead) and O(log n) for the same insert/extract operations.
  • Your data is static and you need range queries. Build a sorted array and use binary search. Lookups are O(log n), no pointer chasing, and the cache performance is dramatically better than a tree of heap-allocated nodes.
  • You need O(1) amortized inserts with occasional sorted access. Consider a skip list or a B-tree (which are better for disk-based storage due to high branching factor).
  • You are writing Python and just need sorted iteration once. sorted(my_dict.keys()) is O(n log n) but has very low constant factors because Timsort is heavily optimized. Only reach for SortedDict if you have many dynamic inserts and need ordered queries repeatedly.

The big picture: balanced BSTs own the space where hash maps fail because you need ordering, and heaps fail because you need more than just the extremum. If you are iterating in order, doing range scans, or computing floor/ceil on a dynamically changing set, a balanced BST is not one option among many — it is the right answer. Knowing when you are in that territory, and knowing that Java's TreeMap and C++'s std::map are already there waiting for you, is the practical payoff of understanding what rotations are buying you.

Check your Big-O intuition if the logarithmic bounds here feel fuzzy — the analysis of why height O(log n) translates directly into operation O(log n) is worth making concrete once.

// FAQ

Frequently asked questions

What does "balanced" mean for a binary search tree?

A balanced BST is one that actively limits its own height. A perfectly balanced tree of n nodes has height ⌊log₂ n⌋ — that is the theoretical minimum. AVL trees guarantee height ≤ 1.44 log₂ n by enforcing that every node's left and right subtrees differ in height by at most 1. Red-black trees allow a slightly looser bound (height ≤ 2 log₂(n+1)) in exchange for fewer rotations on insert and delete. Both are O(log n) for all operations in the worst case, which is the point — an unbalanced BST can degrade to O(n).

What is a tree rotation and why does it help?

A rotation is a local restructuring of three nodes that preserves the BST ordering while changing the height of subtrees. A right rotation on node X promotes X's left child Y to X's position, makes X the right child of Y, and moves Y's old right subtree to become X's new left subtree. The BST property is maintained because the relative ordering of all values stays the same — you are only changing which node is the parent. Rotations are O(1) because they only update a constant number of pointers, and they are the primitive self-balancing trees use to correct height imbalances after an insert or delete.

What is the difference between an AVL tree and a red-black tree?

AVL trees enforce a strict balance factor of -1, 0, or +1 at every node and do more rotations on insert/delete to maintain it. This gives slightly shorter trees and faster lookups. Red-black trees relax the balance rule with color invariants (every path from root to leaf has the same number of black nodes) and do fewer rotations on insert — at most 2 per insertion — making writes faster. In practice: AVL trees are a better fit for read-heavy workloads; red-black trees are preferred in implementations that mix reads and writes, which is why Linux kernel's scheduler, Java's TreeMap, and C++'s std::map all use red-black trees.

When should I use a balanced BST instead of a hash map?

Use a balanced BST when you need anything that requires ordering: range queries ("give me all keys between 100 and 200"), floor/ceil ("what is the largest key ≤ x?"), predecessor/successor, sorted iteration, or the k-th smallest element. Hash maps give O(1) average lookup but destroy ordering — you can't efficiently range-query a hash map. Balanced BSTs give O(log n) lookup in exchange for keeping the keys sorted at all times. If you only ever look up by exact key and never need sorted output, use the hash map.

Is there ever a reason to use a plain (unbalanced) BST?

Rarely in production. An unbalanced BST has O(n) worst-case on all operations when data arrives in sorted or nearly-sorted order, which is common in real workloads (log files, time-series, auto-incremented IDs). The only contexts where it makes sense are: (1) you can prove your insert order will be random, or (2) you are using a splay tree variant that self-adjusts toward recently accessed nodes — but even then you have an amortized O(log n) bound, not a worst-case guarantee. Default to a balanced variant.

// RELATED

You may also like