combination-sum

Difficulty: Medium · Topic: Backtracking · Practice it on parikshan: /practice/combination-sum/start

The story

You build the checkout engine at Razorpay. A merchant wants every way a customer could pay 2,500 rupees using a fixed menu of UPI denominations, say 100, 200, 500, and 2000, with no limit on how many notes of each denomination they tap. The risk team uses this list to compute friction per payment plan; the UX team uses it to pick the default suggestion. The fundamental question: enumerate every multiset of values from a given candidate set that sums to a target.

That is combination-sum. The same shape appears in tax accounting (every way to allocate a deductible across allowable buckets), in cap-table modelling (every way to issue shares from a class pool to total an investment), and in genomics (every way to combine reagent doses to a target concentration). The defining twist is reuse is allowed, a single candidate can appear any number of times in one combination, and that is what separates it from plain subsets.

What's actually being asked

Given a set of k distinct positive integers and a positive target, output every combination of values whose sum equals target. Each value may be used any number of times. Combinations are unordered multisets; print each as a non-decreasing sequence, with the combinations themselves in lex order.

The naive approach

You could enumerate every multiset up to some max length and filter the ones summing to target. With target = 30 and candidates as small as 1, the length could be 30, and the search space is k^30, utterly intractable. The naive answer is correct in principle and useless in practice.

The right approach is backtracking with two prunes: stop when remaining hits zero (success), stop when the current candidate already exceeds remaining (dead end). The combination of monotone candidate order and the early break is what makes the search tight.

The canonical backtracking skeleton

def backtrack(state):
    if is_solution(state):
        record(state)
        return
    for choice in choices(state):
        if prune(state, choice):
            continue   # or break, when the prune is monotone
        apply(state, choice)
        backtrack(state)
        undo(state, choice)

For combination-sum:

state: (path, start, remaining).
is_solution: remaining == 0.
choices: every index i in [start, k).
prune: cand[i] > remaining. Because cand is sorted ascending, every later index is also too large, break, not continue.
apply: path.append(cand[i]).
recurse: dfs(i, remaining - cand[i]), note i, not i + 1.
undo: path.pop().

That i versus i + 1 is the entire "reuse allowed" twist. In subsets we passed i + 1 to prevent revisiting; here we pass i because revisiting is exactly what we want. The monotone start still prevents the same multiset from being emitted in two different orders.

The key insight

Why pass i and not i + 1? Because we want [2, 2, 3] but not [2, 3, 2]. The start index forbids ever going backwards, so once we move from 2 to 3, we cannot return to 2. That gives every multiset a single canonical non-decreasing representation, which is exactly one path in the DFS. No duplicates, no post-sort, lex order for free.

Why prune with break and not continue? Because the candidates are sorted ascending. If cand[i] > remaining, every cand[j] for j > i is also > remaining. Skipping with continue would still be correct but wasteful; break exits the loop immediately. On the stress-large test (cand = [2..8], target = 30) that single character difference is the gap between AC and TLE.

Step-by-step approach

Sort cand ascending.
Define dfs(start, remaining) carrying a global path.
Base: if remaining == 0, append list(path) to the output and return.
Loop i from start to k - 1. If cand[i] > remaining, break. Else push cand[i], call dfs(i, remaining - cand[i]), pop.
Call dfs(0, target). Print the count, then every combination.

Worked example

cand = [2, 3, 6, 7], target = 7

Sorted: [2, 3, 6, 7]. DFS unfolds:

start=0, rem=7, path=[]. Try i=0 (val 2): push, recurse start=0, rem=5.
- Try i=0 (val 2): push, recurse start=0, rem=3.
  - Try i=0 (val 2): push, recurse start=0, rem=1. Loop: cand[0]=2 > 1, break. Backtrack, pop.
  - Try i=1 (val 3): push, recurse start=1, rem=0. Record [2, 2, 3]. Pop.
  - Try i=2 (val 6): 6 > 3, break.
- Pop. Try i=1 (val 3): push, recurse start=1, rem=2. Loop: cand[1]=3 > 2, break. Pop.
- Try i=2 (val 6): 6 > 5, break.
Pop. Try i=1 (val 3): push, recurse start=1, rem=4. Try i=1 (val 3): push, recurse rem=1, break. Pop. Try i=2 (val 6): 6 > 4, break. Pop.
Try i=2 (val 6): push, recurse rem=1, break. Pop.
Try i=3 (val 7): push, recurse rem=0. Record [7]. Pop.

Output: 2 followed by 2 2 3 and 7. Matches the spec.

Complexity

The branching factor is bounded by k and the depth by target / min(cand). In the worst case the recursion explores O(k^(target / min(cand))) paths, with the prune dramatically reducing the constant. For the problem's bounds (k <= 9, target <= 30) the search is small enough to be near-instant.

Time: O(k^d) in the worst case where d = target / min(cand); in practice the prune makes it far smaller.
Space: O(d) for the recursion stack plus the output.

Common pitfalls

Passing i + 1 instead of i: kills the reuse semantics; you get subsets instead.
Forgetting to sort: the prune relies on monotone candidates. Without sorting, you cannot break on cand[i] > remaining, you would have to continue and lose the optimisation.
Using continue instead of break: correct but slow. The stress test exposes this.
Recording without copy: same out.append(list(path)) rule as every backtracking problem.
Off-by-one on remaining: if you allow remaining < 0 paths and check at the top of the recursion, you do extra work. Better: check at the choice point (cand[i] > remaining).

Where this shows up in the enterprise

Razorpay / Stripe / PayPal: every way to assemble a payout from a fixed set of payment-instrument denominations.
Splitwise / Pleo: every way to break an expense into a fixed list of allowable per-person buckets.
Snowflake billing: every combination of credit packs that totals a quoted price.
Pharma compounding: every way to mix doses from a fixed list of vials to reach a prescribed concentration.
Compiler autotuning: every combination of repeated optimisation flags whose total cost-budget matches the configured ceiling.
3D printing: every multiset of filament cartridges whose total length matches a print job.
Recipe optimisation: every multiset of ingredient packs whose total weight matches a target (subscription-meal companies use this for cart packing).

When the candidate set grows large or the target balloons, this morphs into a DP problem (coin-change, coin-change-ii) because counting solutions becomes more useful than enumerating them.

Variants you'll see elsewhere

combination-sum-ii: candidates may repeat, but each candidate can only be used once per combination. Pass i + 1 and add a skip-duplicate rule.
combination-sum-iii: candidates are fixed (1..9), use each at most once, target a specific count and sum.
combination-sum-iv: count the number of compositions where order matters. Drop backtracking, switch to DP, the result counts permutations, not multisets.
coin-change: minimum coin count to make target. Same candidate idea, but optimisation not enumeration, DP wins.
coin-change-ii: count multisets (no order). DP with start-index style outer loop.

All five are the same combinatorial shape with different counting questions. Recognising which one the spec is asking is half the battle.

How parikshan turns this into a learning loop

The dynamic private tests vary candidate sets and targets so that the break versus continue distinction shows up: a continue-only solution passes the small public tests and times out on the stress private. The graduated hints (hint 3 explicitly names the break prune at a 15-point integrity cost) are tuned to nudge without giving away the trick. The dispute flow exists for the edge case where you believe the judge mis-orders a tied combination, though in practice the spec's lex order is well-defined and the judge is correct. Run the problem in practice mode, take hints only when stuck for genuinely five minutes, then redo in exam mode at full speed.

In the AI-integrated workspace

Agents are exceptionally good at combination-sum because it is a textbook problem. They are exceptionally bad at three things on it:

Choosing break versus continue. Generated code defaults to continue because it is the "safer" looking option. On the stress test that decision is a TLE.
Recursing with i versus i + 1. This is a 50/50 coin flip in generated code if you do not specify in the prompt. The spec is "reuse allowed", and the right call is i. Read the recursion and verify.
Sorting the input. Without it, the prune is wrong and the order is wrong. Generated code sometimes sorts the output afterwards "to satisfy lex order", that is a smell. The right place to sort is once, up front.

The senior-engineer move is to name the pattern ("combination-sum with reuse, sort and break") and then read three lines of the agent's code: the sort, the loop's break, and the recursion's i. If those three are right, the rest is mechanical.