subarray-sum-equals-k

Difficulty: Medium · Topic: Arrays & Hashing · Practice it on parikshan: /practice/subarray-sum-equals-k/start

The story

A trading-systems engineer at Robinhood is hunting price drift inside an intraday minute-by-minute return series. Compliance wants the count of windows in the day where the cumulative minute-return summed to exactly the threshold value k, because each such window indicates a potential coordinated trade. The series has 23,400 minutes (390 minutes per session, 60 sessions per day across the supported assets). The compliance pipeline runs this calculation for thousands of symbols every evening; brute-force two-pointer scanning would not finish before the next trading day.

That is subarray-sum-equals-k. The classic exam-room version sounds purely academic until you realise it is the same shape as every "how many contiguous windows match property X" question: anomaly detection in metrics, exact-volume orders in market data, identical-token-count sliding windows in NLP, the list goes on.

What's actually being asked

Given an integer array nums and an integer k, return the number of contiguous non-empty subarrays whose elements sum to exactly k. Values can be negative, zero, or positive. The number of subarrays can be large (think hundreds or thousands per run); just count them.

The naive approach

Two nested loops. For each start i, accumulate a running sum from i to j and increment a counter whenever the sum equals k. O(n^2) time, O(1) space. At n = 2 * 10^4 and a two-second Python budget, this times out reliably.

A second naive variant: prefix sums plus two nested loops. Slightly cleaner code, same O(n^2) cost. Looks like progress, isn't.

The key insight

A subarray sum equals a difference of two prefix sums. Define pref[j] = sum(nums[0..j - 1]) with pref[0] = 0. Then the sum of nums[i..j - 1] equals pref[j] - pref[i]. The question "does this subarray sum to k?" becomes "does there exist an earlier prefix pref[i] such that pref[j] - pref[i] = k?".

That is Two Sum on prefix sums. Walk through indices maintaining a hash map from prefix-sum value to its count of occurrences. At each j, the number of subarrays ending at j that sum to k is exactly seen[pref[j] - k]. Add that to the answer, then increment seen[pref[j]].

Two subtleties anchor correctness:

Seed seen[0] = 1 so subarrays starting at index 0 are counted (their starting prefix is the empty prefix, value 0).
Store counts, not just membership; two distinct earlier prefixes can both have the same sum, and each contributes to the answer.

Step-by-step approach

def subarray_sum(nums, k):
    seen = {0: 1}
    pref = 0
    cnt = 0
    for x in nums:
        pref += x
        cnt += seen.get(pref - k, 0)
        seen[pref] = seen.get(pref, 0) + 1
    return cnt

Four lines of body. The seed {0: 1} is what makes "the whole prefix from start to j sums to k" countable.

Worked example

nums = [1, 1, 1]
k    = 2

j	x	pref before	target = pref - k	hits	seen before	seen after	cnt
-	-	0	-	-	{0: 1}	{0: 1}	0
0	1	1	-1	0	{0: 1}	{0: 1, 1: 1}	0
1	1	2	0	1	{0: 1, 1: 1}	{0: 1, 1: 1, 2: 1}	1
2	1	3	1	1	{0: 1, 1: 1, 2: 1}	{0: 1, 1: 1, 2: 1, 3: 1}	2

Final count: 2. Subarrays that sum to 2: nums[0..1] and nums[1..2]. Confirmed.

Try nums = [1, 2, 3], k = 3. The walk produces cnt = 2, matching [1, 2] and [3].

Complexity

Time: O(n), one pass.
Space: O(n) for the hash map of prefix-sum counts.

Brute force was O(n^2) time, O(1) space. The hash-map approach is asymptotically optimal: you must read each element at least once, and the map's amortised O(1) operations dominate the per-step cost.

Common pitfalls

Forgetting seen[0] = 1: misses every subarray that starts at index 0. The first test case that triggers this is nums = [k] for any k; the algorithm returns 0 instead of 1.
Storing booleans instead of counts: if multiple earlier indices share the same prefix sum, the count is doubled or tripled. Using set membership silently undercounts.
Updating the map before reading: would let the current prefix match itself when k = 0 and produce phantom subarrays. The correct order is read first, write second.
Trying to "extend" with a sliding window: classic mistake. Sliding window works only when all values are non-negative, because that gives monotonicity to the running sum. With negatives, the running sum oscillates and the window pointers cannot make a monotone decision.
k = 0 ambiguity: every two equal prefix sums delimit a zero-sum subarray, including the empty-prefix start. The algorithm handles this correctly because of the count semantics; many naive solutions get this wrong.
Off-by-one between j and j + 1 indexing: writing pref[j + 1] - pref[i] = k versus pref[j] - pref[i] = k is fine as long as you are consistent. Mixing the two conventions is the silent killer.

Where this shows up in the enterprise

Robinhood / Citadel intraday compliance scans: the story above; window-sum equality is a coordinated-trade pre-filter.
Cloudflare DDoS anomaly counting: count of 5-minute windows whose request delta equals a specific spike threshold; same algorithm over request-rate deltas.
AWS CloudWatch metric pattern matching: count of contiguous sample windows whose summed value matches a specific watermark; used in alarm-pattern dashboards.
Stripe risk windows: count of transaction windows summing to a flagged dollar amount within a session.
Snowflake query-cost auditing: contiguous query-cost windows that equal a tier-boundary dollar amount; used to find suspicious cost-shifting patterns.
Datadog log-volume scan: how many contiguous one-minute windows have a log-count delta of exactly k; used to find cyclic ingestion bugs.
Bloomberg corporate-action verification: contiguous trading-day windows where dividend-adjusted return sums to a known value; verification step for split processing.
Spotify session analytics: how often a user's contiguous play-time window equals exactly the album duration (suggests album-mode listening).

The pattern "contiguous windows summing to X" recurs anywhere events arrive in order and someone wants to count exact matches efficiently.

Variants you'll see elsewhere

find-pivot-index: split-on-equality, the additive cousin.
continuous-subarray-sum: subarray sum divisible by k; use pref % k as the hash key.
binary-subarrays-with-sum: same algorithm on a 0/1 array.
path-sum-iii: same algorithm on the path-prefix sums of a binary tree.
subarrays-with-k-different-integers: harder; sliding window with counter tracking.
count-subarrays-with-equal-zeros-and-ones: convert 0 to -1, then this is exactly subarray-sum-equals-0.
make-sum-divisible-by-p: variant where you delete a subarray to make the rest divisible.

The "prefix sums plus hash map" idiom is one of the highest-leverage tricks in arrays and hashing.

How parikshan turns this into a learning loop

Read this article, click into practice. The editor runs your code against public tests plus per-session dynamic private tests generated fresh each attempt, so memorising a gist fails. Hints are graduated: free first hint, small penalty for the second, larger for the third. The AI training assistant in exam mode offers Explain, Review, Find Bugs, Add Comments, Optimise, each with a small integrity cost so you spend AI deliberately. In practice mode AI chat is free, the right place to argue with yourself about why the empty-prefix seed matters. After the verdict, the dispute flow sends contested results to a human reviewer in one click. Attempt in practice mode first, then redo in exam mode untouched. When you can write the four-line solution in five minutes and explain the seed in another two, the pattern is internalised.

In the AI-integrated workspace

An AI agent will, depending on the prompt phrasing, write either the O(n^2) two-loop version or the O(n) prefix-sum version. Your job is to:

Recognise that "count contiguous subarrays with sum X" is exactly this problem. Spec writers will phrase it as "number of windows of any size whose total is k". The pattern recognition is the entire skill.
Reject sliding-window attempts when values can be negative. Generated code routinely tries a two-pointer sliding window because it superficially fits "subarray sum"; on negative inputs it silently undercounts.
Verify the seed. The agent will sometimes write seen = {} and add the empty-prefix only when it appears organically. That is a bug; seed it explicitly.
Verify storage is counts, not booleans. defaultdict(bool) undercounts. defaultdict(int) is the correct primitive.
For variants (divisible-by-k, equal-zeros-and-ones), the agent must transform the key, not just the comparison. Push for the right transformation.

A senior engineer reads "count subarrays summing to k" and immediately reaches for prefix-sum-plus-hash-map. The candidate who can do that productively turns review comments into one-line corrections. The candidate who cannot ships the O(n^2) version, watches it time out under load, and spends the next sprint reading post-mortems. parikshan trains the first kind.