Algorithms - 4. Prefix Sums

When to use Prefix Sums

Use prefix sums when you need to repeatedly query cumulative properties over ranges in an array. While most commonly used for range sums, it can be adapted for:

Range sum queries (classic use case)
Subarray sum problems (find subarrays with target sum)
Cumulative frequency (count elements in ranges)
Range product (with modification using logarithms or separate arrays)
2D matrix region sums (extending to 2D prefix sums)
Difference arrays (applying range updates efficiently)
Any problem requiring multiple queries of cumulative properties over intervals

Example problem

Daily Revenue Analysis: Given daily sales [3, 5, 2, 8, 4, 6, 1], you need to answer multiple queries about revenue for different date ranges.

Query: "What's the total revenue from day 2 to day 5?"
Query: "What's the total revenue from day 0 to day 3?"
Query: "What's the total revenue from day 4 to day 6?"

Answer: Build a prefix sum array once, then answer each query in O(1) time. For day 2 to 5: the sum is 20 (2 + 8 + 4 + 6).

Single most important insight

Any range sum equals the difference between two prefix sums - by precomputing cumulative sums from the start, we can calculate any subarray sum instantly using subtraction: sum(i, j) = prefix[j] - prefix[i-1].

Mental note: The algorithm's power lies in transforming the problem. Instead of summing elements repeatedly (O(n) per query), we precompute all partial sums once. At query time, we only need one subtraction: "total up to right boundary" minus "total before left boundary". This transforms repeated O(n) operations into O(1) lookups.

Why "Prefix Sum"?: The name comes from storing the sum of each prefix (beginning portion) of the array. For array [3, 5, 2, 8], the prefixes are [3], [3, 5], [3, 5, 2], and [3, 5, 2, 8]. We store their sums: [0, 3, 8, 10, 18]. The technique is called "Prefix Sum" specifically because the classic use case is addition—though the same pattern works with other operations (prefix product), they're less common.

Identity elements matter: The initial value depends on the operation: 0 for addition (since x + 0 = x), 1 for multiplication (since x × 1 = x). For prefix products, you'd use prefix[0] = 1 and the inverse operation (division) to extract ranges: product(left, right) = prefix[right+1] / prefix[left]. This parallels how subtraction is the inverse of addition in the sum case.

Decision Framework

The prefix sum technique involves two phases:

Build phase: Accumulate values using the operation (addition, multiplication, etc.) as you traverse the array once

Query phase: For any range [left, right], use the inverse operation on two precomputed values to extract the range result

Addition → Subtraction: sum(left, right) = prefix[right+1] - prefix[left]
Multiplication → Division: product(left, right) = prefix[right+1] / prefix[left]
Maximum → Maximum (idempotent): max(left, right) = max(prefix[right+1], -prefix[left]) (requires different approach like segment trees)

Code

// Build phase: precompute all prefix sums
function buildPrefixSum(nums) {
  // Start with 0 to represent empty prefix
  const prefix = new Array(nums.length + 1).fill(0);

  for (let i = 0; i < nums.length; i++) {
    const currentNum = nums[i];
    const sumSoFar = prefix[i];

    // DECISION FRAMEWORK: accumulate running total
    prefix[i + 1] = sumSoFar + currentNum;
  }

  return prefix;
}

// Query phase: extract range using inverse operation
function rangeSum(prefix, left, right) {
  const totalUpToRight = prefix[right + 1];
  const totalBeforeLeft = prefix[left];

  // DECISION FRAMEWORK: subtract to extract the range
  return totalUpToRight - totalBeforeLeft;
}

// Example usage with daily revenue data
const dailySales = [3, 5, 2, 8, 4, 6, 1];
const prefixSums = buildPrefixSum(dailySales);

console.log(rangeSum(prefixSums, 2, 5)); // 20 (days 2-5: 2+8+4+6)
console.log(rangeSum(prefixSums, 0, 3)); // 18 (days 0-3: 3+5+2+8)
console.log(rangeSum(prefixSums, 4, 6)); // 11 (days 4-6: 4+6+1)

Doubt buster

Common doubt: "Why add an extra element at the beginning of the prefix array? Can't we just use the same indices as the original array?"

Why you might want to avoid the extra element

It seems more intuitive to have a one-to-one correspondence:

Original array: [3, 5, 2, 8, 4, 6, 1]
Prefix array: [3, 8, 10, 18, 22, 28, 29] (same length)

This way, prefix[i] would be the sum of elements from 0 to i, which feels natural.

Why the extra element is crucial

Key realization: Without the extra element, you need special case handling for ranges starting at index 0!

Let's see what happens without the extra element:

Original: [3, 5, 2, 8, 4, 6, 1]
Prefix:   [3, 8, 10, 18, 22, 28, 29]

Query sum(2, 5):
  = prefix[5] - prefix[1]
  = 28 - 8
  = 20 ✓ (correct: 2+8+4+6)

Query sum(0, 3):
  = prefix[3] - prefix[-1]  // Oops! Index -1 doesn't exist
  = prefix[3] - ???
  = Need special case: if (left == 0) return prefix[right]

The elegant solution: prefix[0] = 0

By adding a zero at the beginning:

Original: [3, 5, 2, 8, 4, 6, 1]
Prefix:   [0, 3, 8, 10, 18, 22, 28, 29]
          ↑
          This represents "sum of zero elements"

Query sum(0, 3):
  = prefix[4] - prefix[0]
  = 18 - 0
  = 18 ✓ (correct: 3+5+2+8)

Why this pattern guarantees correctness

The formula sum(left, right) = prefix[right+1] - prefix[left] works universally because:

prefix[i] = sum of first i elements (not including element at index i)
prefix[right+1] = sum of all elements up to and including index right
prefix[left] = sum of all elements before index left
The difference gives exactly the elements in range [left, right]

This eliminates all special cases and makes the code cleaner, more maintainable, and less error-prone. The extra memory cost (one element) is negligible compared to the clarity gained.