parikshan

subsets

Difficulty: Medium  ·  Topic: Backtracking  ·  Practice it on parikshan: /practice/subsets/start

The story

You run feature-flag rollout for a SaaS like LaunchDarkly or Optimizely. A new release has nine flags that interact, and your QA lead wants test coverage on every possible combination of enabled flags before customers see them. Nine flags means 2^9 = 512 configurations. She is willing to wait while CI churns through them, but she wants the list emitted deterministically and in a sane order so the regression report is diffable across runs.

That is subsets. The same shape shows up in compliance auditing (every subset of permissions a role might be granted), in marketing (every subset of segments in an A/B/n test), and in chip design (every subset of pipeline stages to ablate). The problem is universal: enumerate the power set of an input, no item missed, no item repeated.

What's actually being asked

Given an array of n distinct integers, output every subset of it. There are exactly 2^n of them, including the empty set. Print them in lexicographic order: each subset sorted ascending, then the subsets themselves ordered element-by-element with the empty subset first.

The naive approach

The first instinct is recursion without bookkeeping: "for each element, decide pick or skip." That works for enumeration but does not produce lex order, and the implementation is error-prone if you forget to copy the path on record. The second instinct is bitmask enumeration: loop a mask from 0 to 2^n - 1 and emit the indices whose bits are set. That works too, but it gives binary order, not lex order; you would have to sort afterwards.

Neither is wrong. Both are O(n * 2^n), which is optimal because the output itself has that many integers. The choice is about clarity and about which order falls out for free.

The key insight

Sort the input ascending. Then walk the array left to right, holding a path and a start index. At every recursive call, the current path is itself a finished subset: record it. Then iterate i from start to n - 1, push nums[i], recurse with start = i + 1, and pop.

Two properties fall out immediately:

  1. The start index forbids revisiting an earlier element, so every subset is reached by exactly one increasing sequence of indices. No duplicates.
  2. Because nums is sorted and we always extend rightward, subsets are discovered in lex order. No sort step at the end.

This is the same start-index pattern you will reuse for combination-sum, combinations, and every "choose without repeat" problem. Learn it once.

The canonical backtracking skeleton

def backtrack(state):
    if is_solution(state):
        record(state)
        return
    for choice in choices(state):
        apply(state, choice)
        backtrack(state)
        undo(state, choice)

For subsets the instantiation is:

  • state: (path, start)
  • is_solution: every state is a solution, record path at every entry.
  • choices: every index i in [start, n).
  • apply: path.append(nums[i]).
  • undo: path.pop().
  • prune: none. Every subset is valid.

Subsets is the rare backtracking problem with no prune, the search space is exactly the output, so pruning would lose answers.

Step-by-step approach

  1. Sort nums ascending.
  2. Define dfs(start) carrying a global path list.
  3. Inside dfs: append a copy of path to the output list.
  4. Loop i from start to n - 1: push nums[i], call dfs(i + 1), pop.
  5. Call dfs(0). Print the count 2^n, then every recorded subset.

Worked example

nums = [1, 2, 3]

Discovery order, with path shown at the moment of recording:

  1. [], empty.
  2. [1].
  3. [1, 2].
  4. [1, 2, 3].
  5. [1, 3].
  6. [2].
  7. [2, 3].
  8. [3].

Eight subsets, lex order, no duplicates. Check the expected output in the problem statement: it matches.

The bitmask alternative

n = len(nums)
out = []
for mask in range(1 << n):
    out.append([nums[i] for i in range(n) if mask & (1 << i)])
out.sort()

Same complexity, two passes (enumerate, then sort), O(1) recursion depth. The mask 0b011 picks nums[0] and nums[1]. Useful when the problem also asks "for each subset, compute f(subset)" and you want a tight loop with no recursion overhead. When n <= 30 and the inner work is cheap, this is the variant that ships in performance-critical code.

Complexity

  • Time: O(n * 2^n).
  • Space: O(n * 2^n) for the output, O(n) for the recursion stack.

The output itself dominates; you cannot do asymptotically better.

Common pitfalls

  • Appending path by reference: in Python, out.append(path) stores a pointer, and later mutations leak into the recorded subset. Always out.append(list(path)). This is the single most common bug in generated backtracking code.
  • Forgetting the undo: if you push nums[i] and recurse but forget to pop, sibling iterations see a polluted path and you emit garbage.
  • Wrong start: passing start + 1 instead of i + 1 will skip elements; passing start will duplicate them.
  • Sorting after the fact: works for correctness but is wasteful. Sort the input once and the order is free.

Where this shows up in the enterprise

  • LaunchDarkly / Optimizely: enumerate every feature-flag combination for regression testing before a release.
  • AWS IAM / Okta: list every subset of permissions a role might grant, audit which are over-privileged.
  • GitHub Actions / CircleCI: enumerate every subset of optional matrix dimensions a workflow could run with.
  • Pharma / clinical trials: every subset of drug combinations for a phase-1 dose-finding study.
  • Network testing: every subset of nodes to simulate as offline (chaos engineering at small scale).
  • Compiler optimizers: every subset of optimization passes, used in autotuning pipelines like Halide.
  • Pricing rules: every subset of discount stacks a cart engine like Shopify Plus must validate.

The power set is finite and small only when n <= ~20. Beyond that, the question shifts to "do we really need every subset, or only the ones meeting some constraint?", which is where pruning becomes essential and the problem morphs into combination-sum or subsets-with-constraints.

Variants you'll see elsewhere

  • subsets-ii: input has duplicates; skip nums[i] == nums[i-1] when i > start to avoid emitting the same subset twice.
  • combinations: enumerate only subsets of a fixed size k. Same start skeleton, base case is len(path) == k.
  • combination-sum: subsets that sum to a target, with reuse allowed. The start is the same; the recursion uses i instead of i + 1.
  • partition-equal-subset-sum: existence question. Drops backtracking entirely in favour of DP.

All four are the same DFS with a tweaked base case.

How parikshan turns this into a learning loop

The hints are graded. Hint 1 nudges you toward the binary choice (pick or skip). Hint 2 introduces the recursion shape. Hint 3 names the start index. Hint 4 hands you the full skeleton at a steep integrity cost. The dynamic private tests vary n from 0 to 10, so memorising one trace is useless, the order matters and the empty-input edge has to be handled. Run the problem in practice mode first with the AI assistant free of penalty, then redo it in exam mode where each AI request docks integrity. When you can write the backtracking skeleton in under three minutes without consulting the assistant, the pattern is in your fingers and you can move on to permutations or combination-sum.

In the AI-integrated workspace

An agent will write subsets for you in seconds. Your job is to spot three things in the generated code:

  1. The copy on record. out.append(list(path)), not out.append(path). Generated code routinely forgets this and passes the first test (one subset is returned), then fails any test that asks for the full list.
  2. The start parameter on the recursion. Without it the agent enumerates permutations of subsets, not subsets, three times the work and the wrong answer.
  3. The lex-order guarantee. If the agent writes bitmask enumeration and the spec asks for lex order, the agent will need to sort afterwards. Catch it in review or you ship sorted-by-binary output.

The senior-engineer move is to name the pattern out loud, "this is the power-set DFS with start", before reading a single line of generated code. That naming primes you to verify the three pitfalls instead of skimming the diff.

subsets