~/techniques/prefix-sum
◆◆Intermediateasked at Amazonasked at Googleasked at Meta

Prefix Sums

Precompute running totals so any range sum becomes one subtraction. The prefix sum technique turns O(n) per query into O(1) — and the hashmap combo unlocks a whole class of subarray counting problems.

10 min read2026-01-20Ironclad Academy
Complexity cheat sheet
Operation
Average
Worst
Build prefix arrayone linear pass
O(n)
O(n)
Range sum queryone subtraction after build
O(1)
O(1)
Subarray sum equals K (hashmap)single pass with a running count
O(n)
O(n)
Space overheadthe prefix array itself
O(n)
O(n)
// DEPTH
the full breakdown — requirements, capacity, evolution, trade-offs

The mental model: a mileage odometer

Think of driving across a country. At every city you record the total miles driven since you left home. To find the distance between city A and city B, you don't re-drive the route — you just subtract two odometer readings.

Prefix sums are that odometer for an array.

You scan the array once, accumulating a running total. The result is a new array where prefix[i] is the sum of the first i elements. Then for any range [l, r], the sum is just:

sum(arr[l..r]) = prefix[r+1] - prefix[l]

That's it. One subtraction. Let's make it concrete.

flowchart LR
    subgraph "Original array"
        direction LR
        A0["arr[0]=3"] --- A1["arr[1]=1"] --- A2["arr[2]=4"] --- A3["arr[3]=1"] --- A4["arr[4]=5"]
    end
    subgraph "Prefix array (length n+1)"
        direction LR
        P0["pre[0]=0"] --- P1["pre[1]=3"] --- P2["pre[2]=4"] --- P3["pre[3]=8"] --- P4["pre[4]=9"] --- P5["pre[5]=14"]
    end
    style P0 fill:#22d3ee,color:#0a0a0f
    style P3 fill:#7c5cff,color:#fff
    style P5 fill:#7c5cff,color:#fff

Range sum for arr[2..4] (values 4, 1, 5 → should be 10): prefix[5] - prefix[2] = 14 - 4 = 10. Correct. No scanning.

The leading zero at prefix[0] is load-bearing. Without it, querying a subarray that starts at index 0 becomes a special case you have to handle separately. Seed with zero and the formula works uniformly for every range including [0, r].

Building it

The code is almost embarrassingly short.

def build_prefix(arr):
    n = len(arr)
    prefix = [0] * (n + 1)          # one extra slot for the leading zero
    for i in range(n):
        prefix[i + 1] = prefix[i] + arr[i]
    return prefix

def range_sum(prefix, l, r):
    """Sum of arr[l..r], inclusive."""
    return prefix[r + 1] - prefix[l]

Build: O(n) time, O(n) space. Query: O(1).

There's no trick to the build — it's one pass. The trick is recognizing when to build one. The tell is usually "multiple queries" (the problem mentions Q queries, or a function that will be called repeatedly) or "find subarrays" (which you'll want to scan efficiently).

The full complexity picture

OperationTimeSpaceNotes
Build prefix arrayO(n)O(n)done once
Single range queryO(1)after build
K total queriesO(n + K)O(n)vs. O(nK) brute force
Subarray sum equals KO(n)O(n)hashmap combo

At one million queries on an array of 10,000 elements, brute force is 10 billion operations. Prefix sum is 10,000 to build and 1 million lookups — five orders of magnitude faster. The O(n) prep is an investment with infinite-return queries.

The hashmap combo: subarray sum equals K

This is where prefix sums go from "handy trick" to "essential weapon." The problem:

Given an integer array and a target K, count the number of contiguous subarrays that sum to exactly K.

Brute force: two nested loops, O(n²). Fine for n = 100, dead on n = 10⁵.

The prefix sum insight: a subarray arr[l..r] sums to K exactly when prefix[r+1] - prefix[l] == K, which rearranges to prefix[l] == prefix[r+1] - K. So as you build the prefix sum left to right, at each step you're asking: "how many earlier prefix values equal my current prefix minus K?" A hashmap answers that in O(1).

from collections import defaultdict

def subarray_sum_equals_k(arr, k):
    count = 0
    running_sum = 0
    seen = defaultdict(int)
    seen[0] = 1          # empty prefix, sum = 0, has appeared once

    for num in arr:
        running_sum += num
        # how many times has (running_sum - k) appeared?
        count += seen[running_sum - k]
        seen[running_sum] += 1

    return count

The seen[0] = 1 initialization is the same idea as the leading zero in the array version — it handles subarrays starting at index 0 without special-casing. Forget it and you'll drop valid answers any time the entire prefix from index 0 sums to K.

Let's trace through arr = [1, 2, 3], k = 3:

indexnumrunning_sumseen[running_sum - k]countseen
00{0: 1}
011seen[-2] = 00{0:1, 1:1}
123seen[0] = 11{0:1, 1:1, 3:1}
236seen[3] = 12{0:1, 1:1, 3:2, 6:1}

Result: 2 subarrays ([1,2] and [3]). Correct, and the whole thing ran in one pass.

The pattern generalizes. "Count subarrays where the number of X minus number of Y equals K" — turn it into a prefix-difference problem and apply the same trick. "Find longest subarray summing to at most K" — prefix sum plus binary search or two pointers. Recognizing "this is a prefix-sum-plus-hashmap problem" is the unlock for a whole family of questions.

Worked problems

Problem 1: Range sum query (static) — LeetCode 303

You're given an array that won't change. Answer Q range-sum queries.

class NumArray:
    def __init__(self, nums):
        n = len(nums)
        self.prefix = [0] * (n + 1)
        for i in range(n):
            self.prefix[i + 1] = self.prefix[i] + nums[i]

    def sumRange(self, left, right):
        return self.prefix[right + 1] - self.prefix[left]

O(n) setup. Each query is O(1). If someone asks you to add point updates too — update nums[i] and still answer range queries — this breaks down. That's a Fenwick tree problem.

Problem 2: Subarray sum equals K — LeetCode 560

Already worked above. The key insight: you need the hashmap to count, not just detect. If the question asked "does any subarray sum to K?" you could stop at the first hit. "Count all" means you track every occurrence of each prefix sum.

Problem 3: Longest subarray with sum at most K — LeetCode 862 variant

def longest_subarray_sum_at_most_k(arr, k):
    n = len(arr)
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i + 1] = prefix[i] + arr[i]

    best = 0
    # for each right endpoint, find the furthest left endpoint
    # such that prefix[r] - prefix[l] <= k
    # i.e., prefix[l] >= prefix[r] - k
    # two-pointer works when all values are non-negative
    left = 0
    for right in range(1, n + 1):
        while prefix[right] - prefix[left] > k:
            left += 1
        best = max(best, right - left)
    return best

This works when elements are non-negative (so the prefix array is monotonically increasing, making the two-pointer valid). For negative values, the problem gets much harder — LeetCode 862 uses a monotonic deque. Note how the prefix array still does the heavy lifting; the complexity on top of it changes based on constraints.

The 2D extension

The same idea extends to grids. A 2D prefix sum table lets you answer "what is the sum of all values inside this rectangle?" in O(1) after an O(m×n) build.

def build_2d_prefix(grid):
    m, n = len(grid), len(grid[0])
    # extra row and column of zeros at the top and left
    prefix = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            prefix[i][j] = (grid[i-1][j-1]
                            + prefix[i-1][j]    # row above
                            + prefix[i][j-1]    # column left
                            - prefix[i-1][j-1]) # subtract overlap counted twice
    return prefix

def rect_sum(prefix, r1, c1, r2, c2):
    """Sum of grid[r1..r2][c1..c2], inclusive."""
    return (prefix[r2+1][c2+1]
            - prefix[r1][c2+1]
            - prefix[r2+1][c1]
            + prefix[r1][c1])

The inclusion-exclusion formula — add the big rectangle, subtract the two L-shaped regions that overhang, add back the corner you subtracted twice — is the entire trick. Draw it on paper once and you'll never misremember it.

This comes up in LeetCode 304 (Range Sum Query 2D), image processing (summed-area tables for box blur), and any DP problem over a grid where you need rectangle sums.

flowchart TD
    subgraph "2D prefix — rectangle sum"
        BIG["prefix[r2+1][c2+1]<br/>(big rectangle)"]
        TOP["- prefix[r1][c2+1]<br/>(top strip)"]
        LEFT["- prefix[r2+1][c1]<br/>(left strip)"]
        CORNER["+ prefix[r1][c1]<br/>(add back corner)"]
        BIG --> TOP --> LEFT --> CORNER
    end
    style BIG fill:#7c5cff,color:#fff
    style CORNER fill:#00ff9d,color:#0a0a0f

Pitfalls that bite people

Forgetting the leading zero. This is the #1 mistake. If you make the prefix array the same length as the input instead of n+1, computing prefix[r+1] goes out of bounds and you're forced into an ugly conditional for the full-prefix case. Always allocate n+1 and set prefix[0] = 0.

Mutating the array after building the prefix. A prefix array is a snapshot. Any update to arr[i] invalidates every prefix[j] for j > i. If the problem says "update arr[i] and then answer range queries," a static prefix array is wrong. Use a Fenwick tree — it handles both in O(log n).

Using prefix sums when a sliding window is cleaner. If the problem has a fixed window size K ("max sum of K consecutive elements"), a sliding window is simpler: O(n) with O(1) space. Prefix sums add O(n) space for no gain. The smell is "fixed window" — sliding window. "Variable window / subarray summing to target" — prefix sums.

Integer overflow on large inputs. In Python this is irrelevant (arbitrary precision). In Java or C++, a prefix sum over a large array of large values overflows int silently. Use long / int64_t.

The hashmap combo when you need indices, not just counts. subarray_sum_equals_k counts subarrays. If the problem wants the actual indices (or the longest such subarray), the hashmap stores the first index where each prefix sum appeared — seen[running_sum] = i — and you compute the length from the index difference. Small change; easy to conflate the two.

When NOT to use a prefix sum

SituationBetter tool
Updates mixed with range queriesFenwick tree — O(log n) for both
Fixed-size window, no targetsSliding window — O(n), O(1) space
One range query, no repeatsJust scan the range — O(n), zero setup
Range min/max instead of sumSparse table or segment tree
The array is the output, not inputYou may want the difference array (inverse of prefix sum)

The difference array deserves a mention: if you need to apply many range-increment operations and read the final values once, you do the inverse — mark the start and end of each range and scan once at the end. It's prefix sums running backward.

Pure prefix sums are best when the array is built once and queried many times. The moment "built once" breaks down, step up to the Fenwick tree — it's the natural upgrade, and it's not much more code than what you see here.

One more thing worth saying explicitly: prefix sums are one of those techniques that look trivial in isolation but keep showing up embedded inside harder problems. A DP problem that involves subarray sums? The state transition probably wants prefix sums. A graph problem with weighted paths? There might be a prefix-sum reformulation. Getting the O(n) precompute instinct deeply ingrained is what lets you see the optimization when the problem doesn't announce itself.

For the underlying array operations that make all of this work — why random access is O(1), why appending to the prefix array is amortized free — see the Arrays deep dive. And if you reach for a hash table for the hashmap combo, that article covers why the average O(1) lookup holds and where it can degrade.

// FAQ

Frequently asked questions

What is a prefix sum and when should I use it?

A prefix sum array stores running totals: prefix[i] holds the sum of all elements from index 0 through i-1. Once built in O(n), any range sum arr[l..r] is just prefix[r+1] - prefix[l] — constant time. Use it whenever you face multiple range-sum queries on a static array, or when a problem asks about subarrays summing to a target.

How does the prefix sum + hashmap trick for subarray sum equals K work?

Keep a running sum as you scan. At each index, check whether (running_sum - k) has appeared before — if it has, you found a subarray ending here that sums to k. Store every running sum in a hashmap with a count of how many times it appeared. This gives O(n) time and O(n) space instead of the O(n²) brute force.

What is the off-by-one to watch out for with prefix sums?

The standard trick is to make the prefix array one element longer than the input, seeding prefix[0] = 0. Then prefix[i] = sum of arr[0..i-1]. Range sum for arr[l..r] is prefix[r+1] - prefix[l]. If you skip the leading zero and index directly into the input, the edge cases around the first element get fiddly fast.

When should I use a Fenwick tree instead of a plain prefix sum array?

A plain prefix array is read-only after construction — if the underlying array changes, you have to rebuild it in O(n). A Fenwick tree (also called a Binary Indexed Tree) supports both point updates and range queries in O(log n) each. Reach for a Fenwick tree when the problem mixes updates with queries; use a plain prefix array when the input is static.

Can prefix sums handle 2D range queries?

2D prefix sums are a direct extension. Build a 2D prefix table where prefix[i][j] is the sum of the rectangle from (0,0) to (i-1, j-1). Then any sub-rectangle sum is four lookups and some arithmetic. The pattern shows up in image processing (summed-area tables), LeetCode 304, and any grid-based problem that asks about rectangular region sums.

// RELATED

You may also like