Fenwick Trees (Binary Indexed Trees)
Fenwick trees give you O(log n) prefix-sum queries AND point updates in ~10 lines of code — leaner and faster in practice than a segment tree for this exact job. Learn the lowbit trick, build the tree, and know when a BIT beats everything else.
The moment you realize you need this
Here's a concrete scenario. You're building a leaderboard that tracks cumulative scores. Players update their scores constantly, and you need to answer "what is the total score of all players ranked 1 through i?" after every update. You naively tried a prefix sum array first — blazing fast for queries, but every score update meant rebuilding O(n) prefix sums. With 100k players and 500k updates a second, that's 50 billion operations per second. No.
The fix is a Fenwick tree. The update costs O(log n). The query costs O(log n). The code is shorter than what you're reading right now. And it's so cache-friendly that in practice it runs faster than a segment tree even when both are O(log n).
The lowbit trick: i & -i
Before you can understand the BIT, you need to understand one operation cold. In two's complement — the way virtually every modern CPU represents signed integers — negating a number flips all bits and adds one.
Take i = 6 (binary 0110):
- Flip all bits:
1001 - Add one:
1010
So -6 in binary is 1010. Now AND them:
0110 (6)
& 1010 (-6)
------
0010 (2)
The result is 0010 — only the lowest set bit survives. This works for any integer: when you add one to a flipped number, the carry propagates rightward until it hits the position of the original lowest set bit, restoring it and zeroing everything below. All higher bits cancel out in the AND.
What does i & -i give you?
| i (decimal) | i (binary) | i & -i | meaning |
|---|---|---|---|
| 1 | 0001 | 1 | covers 1 element |
| 2 | 0010 | 2 | covers 2 elements |
| 3 | 0011 | 1 | covers 1 element |
| 4 | 0100 | 4 | covers 4 elements |
| 5 | 0101 | 1 | covers 1 element |
| 6 | 0110 | 2 | covers 2 elements |
| 7 | 0111 | 1 | covers 1 element |
| 8 | 1000 | 8 | covers 8 elements |
This is the entire BIT in one table. BIT[i] stores the sum of i & -i elements ending at index i. The ranges never overlap in a way that would make a query hit the same bucket twice. And because each step either adds or subtracts i & -i from i, you traverse at most log n nodes — one per bit.
The responsibility ranges, visualized
Let's make the ranges concrete for an 8-element array [3, 1, 4, 1, 5, 9, 2, 6]:
graph TD
subgraph "BIT structure: which elements each node owns"
B8["BIT[8] = sum[1..8] = 31\n(owns 8 elements)"]
B4["BIT[4] = sum[1..4] = 9\n(owns 4 elements)"]
B6["BIT[6] = sum[5..6] = 14\n(owns 2 elements)"]
B2["BIT[2] = sum[1..2] = 4\n(owns 2 elements)"]
B1["BIT[1] = arr[1] = 3\n(owns 1 element)"]
B3["BIT[3] = arr[3] = 4\n(owns 1 element)"]
B5["BIT[5] = arr[5] = 5\n(owns 1 element)"]
B7["BIT[7] = arr[7] = 2\n(owns 1 element)"]
B8 --> B4
B8 --> B6
B4 --> B2
B4 --> B3
B6 --> B5
B6 --> B7
B2 --> B1
end
style B8 fill:#00ff9d,color:#0a0a0f
style B4 fill:#7c5cff,color:#fff
style B6 fill:#7c5cff,color:#fff
style B2 fill:#22d3ee,color:#0a0a0f
Now watch what happens when you query prefix(7) — the sum of the first 7 elements. Starting at i = 7:
i = 7(binary111):7 & -7 = 1. AddBIT[7](covers justarr[7]). Move:7 - 1 = 6.i = 6(binary110):6 & -6 = 2. AddBIT[6](coversarr[5..6]). Move:6 - 2 = 4.i = 4(binary100):4 & -4 = 4. AddBIT[4](coversarr[1..4]). Move:4 - 4 = 0.i = 0: stop.
Three lookups. Three disjoint ranges: [7], [5..6], [1..4]. Together they cover [1..7] exactly. No overlap, no gap. That's not a coincidence — it's a consequence of the binary representation of 7 (111): there are three set bits, and you peel off one per step.
And for prefix(6) vs prefix(7):
flowchart LR
subgraph "prefix(7): peel bits off 111"
P7_1["i=7: BIT[7]\nrange [7,7]"] --> P7_2["i=6: BIT[6]\nrange [5,6]"] --> P7_3["i=4: BIT[4]\nrange [1,4]"] --> STOP7["i=0 stop"]
end
style P7_1 fill:#7c5cff,color:#fff
style P7_2 fill:#22d3ee,color:#0a0a0f
style P7_3 fill:#00ff9d,color:#0a0a0f
The number of steps equals the number of set bits in i. In the worst case, a 32-bit integer has 32 bits — but n is bounded, so log n steps is the ceiling.
Building the tree
The BIT is 1-indexed. (This is not an accident — the i & -i trick breaks at i = 0.) You'll store it in an array of size n + 1, leaving index 0 unused.
class FenwickTree:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1) # 1-indexed; index 0 is dead space
def update(self, i, delta):
"""Add delta to arr[i]. Walk UP: i += i & -i."""
while i <= self.n:
self.tree[i] += delta
i += i & -i # climb to the next ancestor responsible for i
def prefix(self, i):
"""Sum of arr[1..i]. Walk DOWN: i -= i & -i."""
total = 0
while i > 0:
total += self.tree[i]
i -= i & -i # strip the lowest bit; jump to next disjoint range
return total
def query(self, l, r):
"""Sum of arr[l..r]."""
return self.prefix(r) - self.prefix(l - 1)
@classmethod
def build(cls, arr):
"""Build from a 0-indexed array in O(n)."""
n = len(arr)
bit = cls(n)
for i, val in enumerate(arr):
bit.update(i + 1, val) # shift to 1-indexed
return bit
That's the whole thing. Seventeen lines including comments and the class wrapper. The two core operations are each three lines of logic.
Why i += i & -i climbs to an ancestor: adding the lowest set bit increments the bit directly above it (or higher). For example, 3 (011) → adding 1 (001) gives 4 (100). Node 4 is responsible for a range that includes the range covered by node 3, so it needs the update too.
Why i -= i & -i gives disjoint ranges: stripping the lowest bit moves you to the largest index that has no overlap with the range you just covered. After peeling 7 → 6 → 4 → 0, the ranges [7], [5..6], [1..4] tile [1..7] perfectly.
Let's trace an update — say we add +2 to arr[3]:
flowchart TD
U1["i=3 (011): update BIT[3]\ncovers [3,3]"]
U2["i=4 (100): update BIT[4]\ncovers [1,4]"]
U3["i=8 (1000): update BIT[8]\ncovers [1,8]"]
U4["i=16 > n: stop"]
U1 -->|"3 += 3&-3 = 1 → 4"| U2
U2 -->|"4 += 4&-4 = 4 → 8"| U3
U3 -->|"8 += 8&-8 = 8 → 16"| U4
style U1 fill:#7c5cff,color:#fff
style U2 fill:#22d3ee,color:#0a0a0f
style U3 fill:#00ff9d,color:#0a0a0f
Three nodes touched for an 8-element tree — that's log₂(8) = 3. Perfect.
The complexity picture
| Operation | BIT | Segment Tree | Static prefix array |
|---|---|---|---|
| Build | O(n) | O(n) | O(n) |
| Point update | O(log n) | O(log n) | O(n) |
| Prefix query | O(log n) | O(log n) | O(1) |
| Range sum | O(log n) | O(log n) | O(1) |
| Range min/max | ✗ | O(log n) | O(1)* |
| Memory | O(n) | O(4n) | O(n) |
| Code lines | ~10 | ~40–60 | ~5 |
* static prefix array for range min requires a sparse table (O(n log n) build).
The BIT and segment tree are neck-and-neck on the operations they share — but the BIT uses half the memory and is meaningfully simpler to implement under pressure. On modern hardware, the BIT also wins on cache behavior: its O(log n) access pattern touches a contiguous array with no pointer chasing, while a segment tree node jumps to index 2i+1 or 2i+2 (fine in theory, slightly worse in practice at large n due to cache line effects).
Patterns and worked problems
Pattern: range sum with point updates
This is the canonical case. Any time you have an array that changes element-by-element and you need sums over arbitrary ranges, reach for the BIT first.
Problem 1 (easy): Count of smaller numbers after self. Given an array, for each element, count how many elements to its right are smaller. Classic BIT pattern: process the array right to left, and for each element x, query prefix(x - 1) (elements already inserted that are smaller than x), then update(x, 1) to record x.
def count_smaller(nums):
# coordinate compress: map values to [1..n]
sorted_unique = sorted(set(nums))
rank = {v: i + 1 for i, v in enumerate(sorted_unique)}
n = len(nums)
bit = FenwickTree(len(rank))
result = []
for num in reversed(nums):
r = rank[num]
result.append(bit.prefix(r - 1)) # count of elements < num already inserted
bit.update(r, 1) # register this element
return result[::-1]
Walking right-to-left, each element is registered in the BIT by its rank. prefix(r - 1) counts registered elements with a smaller rank — exactly those that appear to the right and are smaller than num. O(n log n) total.
Problem 2 (medium): Range sum queries with updates. This is the textbook case — LeetCode 307, "Range Sum Query – Mutable."
class NumArray:
def __init__(self, nums):
self.n = len(nums)
self.bit = FenwickTree(self.n)
self.arr = [0] * (self.n + 1) # mirror of original values (1-indexed)
for i, v in enumerate(nums):
self._set(i + 1, v)
def _set(self, i, val):
self.bit.update(i, val - self.arr[i]) # only the delta
self.arr[i] = val
def update(self, index, val):
self._set(index + 1, val) # 0→1-indexed
def sumRange(self, left, right):
return self.bit.query(left + 1, right + 1)
The key subtlety: the BIT stores deltas, not absolute values. When you call update(index, val), you don't push val — you push val - old_val. Keep a mirror of the current values so you can compute the delta.
Pattern: range updates via difference BIT
Suppose you need range updates ("add 5 to all elements in [l, r]") and point queries ("what is the current value at index i?"). Flip the model: maintain a difference BIT D where the actual value at i is prefix(D, i).
- Range update
[l, r]by delta:update(D, l, +delta)andupdate(D, r+1, -delta). Two O(log n) ops. - Point query at
i:prefix(D, i). One O(log n) op.
This is the same insight as a difference array (covered in the prefix sum technique), but now the underlying array is a BIT so both operations are O(log n) instead of O(1) update / O(n) query.
Pitfalls
Using 0-indexing without adapting. The BIT's i & -i trick breaks at i = 0 (it loops forever or gives nonsense). Always shift your indices by +1 when feeding them to the BIT. If your problem gives you 0-indexed positions, add 1 going in, subtract 1 coming out. Internalize this once; don't redo the reasoning every time.
Forgetting to store deltas. When you call update(i, val) on a mutable array problem, the BIT doesn't know the old value. If you push val instead of val - old_val, every update becomes additive and your answers spiral out of control after the first mutation. Keep a shadow array of current values for the delta.
Querying prefix(0): This should return 0 (empty sum). Most implementations handle it naturally — the while loop exits immediately — but double-check yours. If you call prefix(-1) by accident (e.g., query(l, r) with l = 1 calling prefix(l - 1) = prefix(0)), you want 0, not a crash or an infinite loop.
Trying to do range min/max. This is where people waste real time. The BIT cannot track range minimums — the "peel bits off" traversal gives you disjoint ranges that you accumulate with +, and that works for sums (and similar invertible operations). Minimum isn't invertible: if you split a range, you can't reconstruct min(A ∪ B) from min(A) and min(B) without knowing which was the global minimum. For range min/max, you need a segment tree. Don't try to hack the BIT into doing this.
Off-by-one on the update when coordinate-compressing. If you're using a BIT to count frequencies of values (like in the "count of inversions" problem), you need to map your values to a compact rank space first. If you skip coordinate compression and try to use raw values as indices, you'll either allocate a billion-element array or get index out of bounds. Sort the unique values, assign ranks 1..k, and use those as your BIT indices.
When NOT to use a Fenwick tree
| Your actual need | Better choice | Why |
|---|---|---|
| Static array, prefix sums only | Prefix sum array | O(1) queries, no overhead |
| Range minimum / maximum queries | Segment tree | BIT can't do min/max |
| Range updates + range queries | Segment tree with lazy propagation | Two-BIT trick exists but segment tree is cleaner |
| Full sorted order / rank queries | Order-statistics BST | BIT has no notion of sorted structure |
| Just need to sum a sequence once | Simple accumulation | Overkill; don't bring a BIT to a sum() fight |
| No updates, only inversion count | Merge sort | O(n log n), same asymptotic, less setup |
The BIT's sweet spot is narrow but common: a mutable array where you frequently need prefix (or range) sums. That's it. Outside that box, reach for something else.
If you're choosing between BIT and segment tree: if you can solve your problem with prefix sums and point updates, use the BIT — it's faster to write, easier to debug under pressure, and slightly faster in practice. If you need anything beyond that — range min, lazy range updates, or custom aggregation functions — use the segment tree and accept the extra complexity. Both are O(log n); the constants and code complexity tip the scale.
Putting it together: inversion count
The inversion count problem is the cleanest showcase of what a BIT can do. Given an array, count the number of pairs (i, j) where i < j and arr[i] > arr[j].
The brute force is O(n²). Merge sort gives O(n log n). And a BIT gives O(n log n) with arguably cleaner code:
def count_inversions(arr):
# coordinate compress to ranks 1..n
sorted_arr = sorted(set(arr))
rank = {v: i + 1 for i, v in enumerate(sorted_arr)}
n = len(arr)
bit = FenwickTree(n)
inversions = 0
for val in arr:
r = rank[val]
# elements already processed that are GREATER than val
# = (elements processed so far) - (elements processed that are ≤ val)
elements_seen = bit.prefix(n) # total elements inserted so far
elements_leq = bit.prefix(r) # elements ≤ val inserted so far
inversions += elements_seen - elements_leq
bit.update(r, 1)
return inversions
Process left to right. When you process val, every element already in the BIT came from the left. The elements in the BIT that are strictly greater than val form inversions with val. prefix(n) - prefix(r) counts exactly those elements. Then you register val and move on. O(n log n) total, O(n) space.
This is the Fenwick tree in its element: a running count over a dynamic set, answered with prefix queries. Start with arrays, layer on prefix sums for the static case, add a BIT for mutations, and reach for the segment tree when you need richer operations. For this exact job, nothing beats the BIT's simplicity.
Frequently asked questions
▸What is a Fenwick tree used for?
A Fenwick tree (also called a Binary Indexed Tree or BIT) is a data structure that supports two operations on an array in O(log n) each: update a single element by some delta, and query the prefix sum from index 1 to i. The classic use case is when you have an array that gets mutated frequently and you need running or range sums without rebuilding from scratch each time. It also handles range sum queries in O(log n) via two prefix queries. Competitive programmers reach for it as a leaner alternative to a segment tree for this specific problem shape.
▸What is the lowbit trick (i & -i) and why does it work?
In two's complement, -i is the bitwise NOT of i plus one. That cancels every bit except the lowest set bit, leaving only i's least-significant 1-bit. For example, 6 in binary is 0110; -6 is 1010; 6 & -6 = 0010 = 2. This isolated bit tells you the responsibility range of node i in the Fenwick tree: node i stores the sum of the 2 elements ending at i (where 2 equals i & -i). Adding i & -i to i moves to the next responsible ancestor during an update; subtracting it moves to the next non-overlapping prefix during a query.
▸Is a Fenwick tree the same as a segment tree?
No, though they solve similar problems. A Fenwick tree is leaner — about half the code, half the memory, and more cache-friendly — but only handles prefix (and range) sum queries plus point updates naturally. A segment tree is more general: it can handle range minimum, range maximum, arbitrary associative operations, and lazy propagation for range updates. If you specifically need prefix sums with point updates, a BIT is the better tool. If you need range min/max or lazy range updates, reach for a segment tree.
▸How do you handle range updates (not just point updates) with a Fenwick tree?
The standard trick is to maintain a difference array BIT instead of the value array. To add delta to every element in [l, r], do two point updates: update(l, +delta) and update(r+1, -delta). To query the current value at index i, query the prefix sum up to i on the difference BIT. This gives range update + point query in O(log n) each. For range update + range query simultaneously, you need two BITs — a standard technique but slightly more involved.
▸When should I use a Fenwick tree over a prefix sum array?
Use a plain prefix sum array when the array never changes after you build it — queries are O(1) and there is zero overhead. A Fenwick tree shines when the array is mutable: each point update on a static prefix sum array costs O(n) to rebuild; the BIT costs O(log n). The break-even is roughly one update per every log(n) queries — if you have more updates than that, the BIT wins. See the prefix sums technique article for the static case.
You may also like
Union-Find (Disjoint Set Union)
The near-magic data structure for connectivity queries — are these two nodes in the same component? Nearly O(1) per operation with union by rank and path compression. Kruskal's MST, redundant connections, account merging, and more.
Tries (Prefix Trees)
The data structure powering autocomplete, spell-check, and IP routing — a tree where each edge is a character and every path from the root spells a prefix. Insert, search, and prefix queries all run in O(L) on word length alone, completely independent of how many words you store.
Strings
Strings look like simple text, but they hide a brutal trap: naive concatenation in a loop is O(n²). Learn why, how encoding actually works, and the handful of patterns — sliding window, two pointers, hashing — that solve 80% of string interview problems.