~/data structures/fenwick-tree
◆◆◆Advancedasked at Google

Fenwick Trees (Binary Indexed Trees)

Fenwick trees give you O(log n) prefix-sum queries AND point updates in ~10 lines of code — leaner and faster in practice than a segment tree for this exact job. Learn the lowbit trick, build the tree, and know when a BIT beats everything else.

Complexity cheat sheet
Operation
Average
Worst
Buildn point updates, each O(log n), but the tight bound is O(n)
O(n)
O(n)
Point updatewalk up the tree via i += i & -i
O(log n)
O(log n)
Prefix sum query [1..i]walk down via i -= i & -i
O(log n)
O(log n)
Range sum query [l..r]prefix(r) - prefix(l-1)
O(log n)
O(log n)
Point query (single element)range query of length 1
O(log n)
O(log n)
Range update (difference array trick)two point updates on a difference BIT
O(log n)
O(log n)
// DEPTH
the full breakdown — requirements, capacity, evolution, trade-offs

The moment you realize you need this

Here's a concrete scenario. You're building a leaderboard that tracks cumulative scores. Players update their scores constantly, and you need to answer "what is the total score of all players ranked 1 through i?" after every update. You naively tried a prefix sum array first — blazing fast for queries, but every score update meant rebuilding O(n) prefix sums. With 100k players and 500k updates a second, that's 50 billion operations per second. No.

The fix is a Fenwick tree. The update costs O(log n). The query costs O(log n). The code is shorter than what you're reading right now. And it's so cache-friendly that in practice it runs faster than a segment tree even when both are O(log n).

The lowbit trick: i & -i

Before you can understand the BIT, you need to understand one operation cold. In two's complement — the way virtually every modern CPU represents signed integers — negating a number flips all bits and adds one.

Take i = 6 (binary 0110):

  • Flip all bits: 1001
  • Add one: 1010

So -6 in binary is 1010. Now AND them:

  0110   (6)
& 1010   (-6)
------
  0010   (2)

The result is 0010 — only the lowest set bit survives. This works for any integer: when you add one to a flipped number, the carry propagates rightward until it hits the position of the original lowest set bit, restoring it and zeroing everything below. All higher bits cancel out in the AND.

What does i & -i give you?

i (decimal)i (binary)i & -imeaning
100011covers 1 element
200102covers 2 elements
300111covers 1 element
401004covers 4 elements
501011covers 1 element
601102covers 2 elements
701111covers 1 element
810008covers 8 elements

This is the entire BIT in one table. BIT[i] stores the sum of i & -i elements ending at index i. The ranges never overlap in a way that would make a query hit the same bucket twice. And because each step either adds or subtracts i & -i from i, you traverse at most log n nodes — one per bit.

The responsibility ranges, visualized

Let's make the ranges concrete for an 8-element array [3, 1, 4, 1, 5, 9, 2, 6]:

graph TD
    subgraph "BIT structure: which elements each node owns"
      B8["BIT[8] = sum[1..8] = 31\n(owns 8 elements)"]
      B4["BIT[4] = sum[1..4] = 9\n(owns 4 elements)"]
      B6["BIT[6] = sum[5..6] = 14\n(owns 2 elements)"]
      B2["BIT[2] = sum[1..2] = 4\n(owns 2 elements)"]
      B1["BIT[1] = arr[1] = 3\n(owns 1 element)"]
      B3["BIT[3] = arr[3] = 4\n(owns 1 element)"]
      B5["BIT[5] = arr[5] = 5\n(owns 1 element)"]
      B7["BIT[7] = arr[7] = 2\n(owns 1 element)"]
      B8 --> B4
      B8 --> B6
      B4 --> B2
      B4 --> B3
      B6 --> B5
      B6 --> B7
      B2 --> B1
    end
    style B8 fill:#00ff9d,color:#0a0a0f
    style B4 fill:#7c5cff,color:#fff
    style B6 fill:#7c5cff,color:#fff
    style B2 fill:#22d3ee,color:#0a0a0f

Now watch what happens when you query prefix(7) — the sum of the first 7 elements. Starting at i = 7:

  1. i = 7 (binary 111): 7 & -7 = 1. Add BIT[7] (covers just arr[7]). Move: 7 - 1 = 6.
  2. i = 6 (binary 110): 6 & -6 = 2. Add BIT[6] (covers arr[5..6]). Move: 6 - 2 = 4.
  3. i = 4 (binary 100): 4 & -4 = 4. Add BIT[4] (covers arr[1..4]). Move: 4 - 4 = 0.
  4. i = 0: stop.

Three lookups. Three disjoint ranges: [7], [5..6], [1..4]. Together they cover [1..7] exactly. No overlap, no gap. That's not a coincidence — it's a consequence of the binary representation of 7 (111): there are three set bits, and you peel off one per step.

And for prefix(6) vs prefix(7):

flowchart LR
    subgraph "prefix(7): peel bits off 111"
      P7_1["i=7: BIT[7]\nrange [7,7]"] --> P7_2["i=6: BIT[6]\nrange [5,6]"] --> P7_3["i=4: BIT[4]\nrange [1,4]"] --> STOP7["i=0 stop"]
    end
    style P7_1 fill:#7c5cff,color:#fff
    style P7_2 fill:#22d3ee,color:#0a0a0f
    style P7_3 fill:#00ff9d,color:#0a0a0f

The number of steps equals the number of set bits in i. In the worst case, a 32-bit integer has 32 bits — but n is bounded, so log n steps is the ceiling.

Building the tree

The BIT is 1-indexed. (This is not an accident — the i & -i trick breaks at i = 0.) You'll store it in an array of size n + 1, leaving index 0 unused.

class FenwickTree:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)   # 1-indexed; index 0 is dead space

    def update(self, i, delta):
        """Add delta to arr[i]. Walk UP: i += i & -i."""
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i              # climb to the next ancestor responsible for i

    def prefix(self, i):
        """Sum of arr[1..i]. Walk DOWN: i -= i & -i."""
        total = 0
        while i > 0:
            total += self.tree[i]
            i -= i & -i              # strip the lowest bit; jump to next disjoint range
        return total

    def query(self, l, r):
        """Sum of arr[l..r]."""
        return self.prefix(r) - self.prefix(l - 1)

    @classmethod
    def build(cls, arr):
        """Build from a 0-indexed array in O(n)."""
        n = len(arr)
        bit = cls(n)
        for i, val in enumerate(arr):
            bit.update(i + 1, val)   # shift to 1-indexed
        return bit

That's the whole thing. Seventeen lines including comments and the class wrapper. The two core operations are each three lines of logic.

Why i += i & -i climbs to an ancestor: adding the lowest set bit increments the bit directly above it (or higher). For example, 3 (011) → adding 1 (001) gives 4 (100). Node 4 is responsible for a range that includes the range covered by node 3, so it needs the update too.

Why i -= i & -i gives disjoint ranges: stripping the lowest bit moves you to the largest index that has no overlap with the range you just covered. After peeling 7640, the ranges [7], [5..6], [1..4] tile [1..7] perfectly.

Let's trace an update — say we add +2 to arr[3]:

flowchart TD
    U1["i=3 (011): update BIT[3]\ncovers [3,3]"]
    U2["i=4 (100): update BIT[4]\ncovers [1,4]"]
    U3["i=8 (1000): update BIT[8]\ncovers [1,8]"]
    U4["i=16 > n: stop"]
    U1 -->|"3 += 3&-3 = 1 → 4"| U2
    U2 -->|"4 += 4&-4 = 4 → 8"| U3
    U3 -->|"8 += 8&-8 = 8 → 16"| U4
    style U1 fill:#7c5cff,color:#fff
    style U2 fill:#22d3ee,color:#0a0a0f
    style U3 fill:#00ff9d,color:#0a0a0f

Three nodes touched for an 8-element tree — that's log₂(8) = 3. Perfect.

The complexity picture

OperationBITSegment TreeStatic prefix array
BuildO(n)O(n)O(n)
Point updateO(log n)O(log n)O(n)
Prefix queryO(log n)O(log n)O(1)
Range sumO(log n)O(log n)O(1)
Range min/maxO(log n)O(1)*
MemoryO(n)O(4n)O(n)
Code lines~10~40–60~5

* static prefix array for range min requires a sparse table (O(n log n) build).

The BIT and segment tree are neck-and-neck on the operations they share — but the BIT uses half the memory and is meaningfully simpler to implement under pressure. On modern hardware, the BIT also wins on cache behavior: its O(log n) access pattern touches a contiguous array with no pointer chasing, while a segment tree node jumps to index 2i+1 or 2i+2 (fine in theory, slightly worse in practice at large n due to cache line effects).

Patterns and worked problems

Pattern: range sum with point updates

This is the canonical case. Any time you have an array that changes element-by-element and you need sums over arbitrary ranges, reach for the BIT first.

Problem 1 (easy): Count of smaller numbers after self. Given an array, for each element, count how many elements to its right are smaller. Classic BIT pattern: process the array right to left, and for each element x, query prefix(x - 1) (elements already inserted that are smaller than x), then update(x, 1) to record x.

def count_smaller(nums):
    # coordinate compress: map values to [1..n]
    sorted_unique = sorted(set(nums))
    rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
    n = len(nums)

    bit = FenwickTree(len(rank))
    result = []

    for num in reversed(nums):
        r = rank[num]
        result.append(bit.prefix(r - 1))   # count of elements < num already inserted
        bit.update(r, 1)                    # register this element

    return result[::-1]

Walking right-to-left, each element is registered in the BIT by its rank. prefix(r - 1) counts registered elements with a smaller rank — exactly those that appear to the right and are smaller than num. O(n log n) total.

Problem 2 (medium): Range sum queries with updates. This is the textbook case — LeetCode 307, "Range Sum Query – Mutable."

class NumArray:
    def __init__(self, nums):
        self.n = len(nums)
        self.bit = FenwickTree(self.n)
        self.arr = [0] * (self.n + 1)   # mirror of original values (1-indexed)
        for i, v in enumerate(nums):
            self._set(i + 1, v)

    def _set(self, i, val):
        self.bit.update(i, val - self.arr[i])   # only the delta
        self.arr[i] = val

    def update(self, index, val):
        self._set(index + 1, val)               # 0→1-indexed

    def sumRange(self, left, right):
        return self.bit.query(left + 1, right + 1)

The key subtlety: the BIT stores deltas, not absolute values. When you call update(index, val), you don't push val — you push val - old_val. Keep a mirror of the current values so you can compute the delta.

Pattern: range updates via difference BIT

Suppose you need range updates ("add 5 to all elements in [l, r]") and point queries ("what is the current value at index i?"). Flip the model: maintain a difference BIT D where the actual value at i is prefix(D, i).

  • Range update [l, r] by delta: update(D, l, +delta) and update(D, r+1, -delta). Two O(log n) ops.
  • Point query at i: prefix(D, i). One O(log n) op.

This is the same insight as a difference array (covered in the prefix sum technique), but now the underlying array is a BIT so both operations are O(log n) instead of O(1) update / O(n) query.

Pitfalls

Using 0-indexing without adapting. The BIT's i & -i trick breaks at i = 0 (it loops forever or gives nonsense). Always shift your indices by +1 when feeding them to the BIT. If your problem gives you 0-indexed positions, add 1 going in, subtract 1 coming out. Internalize this once; don't redo the reasoning every time.

Forgetting to store deltas. When you call update(i, val) on a mutable array problem, the BIT doesn't know the old value. If you push val instead of val - old_val, every update becomes additive and your answers spiral out of control after the first mutation. Keep a shadow array of current values for the delta.

Querying prefix(0): This should return 0 (empty sum). Most implementations handle it naturally — the while loop exits immediately — but double-check yours. If you call prefix(-1) by accident (e.g., query(l, r) with l = 1 calling prefix(l - 1) = prefix(0)), you want 0, not a crash or an infinite loop.

Trying to do range min/max. This is where people waste real time. The BIT cannot track range minimums — the "peel bits off" traversal gives you disjoint ranges that you accumulate with +, and that works for sums (and similar invertible operations). Minimum isn't invertible: if you split a range, you can't reconstruct min(A ∪ B) from min(A) and min(B) without knowing which was the global minimum. For range min/max, you need a segment tree. Don't try to hack the BIT into doing this.

Off-by-one on the update when coordinate-compressing. If you're using a BIT to count frequencies of values (like in the "count of inversions" problem), you need to map your values to a compact rank space first. If you skip coordinate compression and try to use raw values as indices, you'll either allocate a billion-element array or get index out of bounds. Sort the unique values, assign ranks 1..k, and use those as your BIT indices.

When NOT to use a Fenwick tree

Your actual needBetter choiceWhy
Static array, prefix sums onlyPrefix sum arrayO(1) queries, no overhead
Range minimum / maximum queriesSegment treeBIT can't do min/max
Range updates + range queriesSegment tree with lazy propagationTwo-BIT trick exists but segment tree is cleaner
Full sorted order / rank queriesOrder-statistics BSTBIT has no notion of sorted structure
Just need to sum a sequence onceSimple accumulationOverkill; don't bring a BIT to a sum() fight
No updates, only inversion countMerge sortO(n log n), same asymptotic, less setup

The BIT's sweet spot is narrow but common: a mutable array where you frequently need prefix (or range) sums. That's it. Outside that box, reach for something else.

If you're choosing between BIT and segment tree: if you can solve your problem with prefix sums and point updates, use the BIT — it's faster to write, easier to debug under pressure, and slightly faster in practice. If you need anything beyond that — range min, lazy range updates, or custom aggregation functions — use the segment tree and accept the extra complexity. Both are O(log n); the constants and code complexity tip the scale.

Putting it together: inversion count

The inversion count problem is the cleanest showcase of what a BIT can do. Given an array, count the number of pairs (i, j) where i < j and arr[i] > arr[j].

The brute force is O(n²). Merge sort gives O(n log n). And a BIT gives O(n log n) with arguably cleaner code:

def count_inversions(arr):
    # coordinate compress to ranks 1..n
    sorted_arr = sorted(set(arr))
    rank = {v: i + 1 for i, v in enumerate(sorted_arr)}
    n = len(arr)

    bit = FenwickTree(n)
    inversions = 0

    for val in arr:
        r = rank[val]
        # elements already processed that are GREATER than val
        # = (elements processed so far) - (elements processed that are ≤ val)
        elements_seen = bit.prefix(n)   # total elements inserted so far
        elements_leq  = bit.prefix(r)   # elements ≤ val inserted so far
        inversions += elements_seen - elements_leq
        bit.update(r, 1)

    return inversions

Process left to right. When you process val, every element already in the BIT came from the left. The elements in the BIT that are strictly greater than val form inversions with val. prefix(n) - prefix(r) counts exactly those elements. Then you register val and move on. O(n log n) total, O(n) space.

This is the Fenwick tree in its element: a running count over a dynamic set, answered with prefix queries. Start with arrays, layer on prefix sums for the static case, add a BIT for mutations, and reach for the segment tree when you need richer operations. For this exact job, nothing beats the BIT's simplicity.

// FAQ

Frequently asked questions

What is a Fenwick tree used for?

A Fenwick tree (also called a Binary Indexed Tree or BIT) is a data structure that supports two operations on an array in O(log n) each: update a single element by some delta, and query the prefix sum from index 1 to i. The classic use case is when you have an array that gets mutated frequently and you need running or range sums without rebuilding from scratch each time. It also handles range sum queries in O(log n) via two prefix queries. Competitive programmers reach for it as a leaner alternative to a segment tree for this specific problem shape.

What is the lowbit trick (i & -i) and why does it work?

In two's complement, -i is the bitwise NOT of i plus one. That cancels every bit except the lowest set bit, leaving only i's least-significant 1-bit. For example, 6 in binary is 0110; -6 is 1010; 6 & -6 = 0010 = 2. This isolated bit tells you the responsibility range of node i in the Fenwick tree: node i stores the sum of the 2 elements ending at i (where 2 equals i & -i). Adding i & -i to i moves to the next responsible ancestor during an update; subtracting it moves to the next non-overlapping prefix during a query.

Is a Fenwick tree the same as a segment tree?

No, though they solve similar problems. A Fenwick tree is leaner — about half the code, half the memory, and more cache-friendly — but only handles prefix (and range) sum queries plus point updates naturally. A segment tree is more general: it can handle range minimum, range maximum, arbitrary associative operations, and lazy propagation for range updates. If you specifically need prefix sums with point updates, a BIT is the better tool. If you need range min/max or lazy range updates, reach for a segment tree.

How do you handle range updates (not just point updates) with a Fenwick tree?

The standard trick is to maintain a difference array BIT instead of the value array. To add delta to every element in [l, r], do two point updates: update(l, +delta) and update(r+1, -delta). To query the current value at index i, query the prefix sum up to i on the difference BIT. This gives range update + point query in O(log n) each. For range update + range query simultaneously, you need two BITs — a standard technique but slightly more involved.

When should I use a Fenwick tree over a prefix sum array?

Use a plain prefix sum array when the array never changes after you build it — queries are O(1) and there is zero overhead. A Fenwick tree shines when the array is mutable: each point update on a static prefix sum array costs O(n) to rebuild; the BIT costs O(log n). The break-even is roughly one update per every log(n) queries — if you have more updates than that, the BIT wins. See the prefix sums technique article for the static case.

// RELATED

You may also like