subsets
Difficulty: Medium · Topic: Backtracking · Practice it on parikshan: /practice/subsets/start
The story
You run feature-flag rollout for a SaaS like LaunchDarkly or Optimizely. A new release has nine flags that interact, and your QA lead wants test coverage on every possible combination of enabled flags before customers see them. Nine flags means 2^9 = 512 configurations. She is willing to wait while CI churns through them, but she wants the list emitted deterministically and in a sane order so the regression report is diffable across runs.
That is subsets. The same shape shows up in compliance auditing (every subset of permissions a role might be granted), in marketing (every subset of segments in an A/B/n test), and in chip design (every subset of pipeline stages to ablate). The problem is universal: enumerate the power set of an input, no item missed, no item repeated.
What's actually being asked
Given an array of n distinct integers, output every subset of it. There are exactly 2^n of them, including the empty set. Print them in lexicographic order: each subset sorted ascending, then the subsets themselves ordered element-by-element with the empty subset first.
The naive approach
The first instinct is recursion without bookkeeping: "for each element, decide pick or skip." That works for enumeration but does not produce lex order, and the implementation is error-prone if you forget to copy the path on record. The second instinct is bitmask enumeration: loop a mask from 0 to 2^n - 1 and emit the indices whose bits are set. That works too, but it gives binary order, not lex order; you would have to sort afterwards.
Neither is wrong. Both are O(n * 2^n), which is optimal because the output itself has that many integers. The choice is about clarity and about which order falls out for free.
The key insight
Sort the input ascending. Then walk the array left to right, holding a path and a start index. At every recursive call, the current path is itself a finished subset: record it. Then iterate i from start to n - 1, push nums[i], recurse with start = i + 1, and pop.
Two properties fall out immediately:
- The
startindex forbids revisiting an earlier element, so every subset is reached by exactly one increasing sequence of indices. No duplicates. - Because
numsis sorted and we always extend rightward, subsets are discovered in lex order. No sort step at the end.
This is the same start-index pattern you will reuse for combination-sum, combinations, and every "choose without repeat" problem. Learn it once.
The canonical backtracking skeleton
def backtrack(state):
if is_solution(state):
record(state)
return
for choice in choices(state):
apply(state, choice)
backtrack(state)
undo(state, choice)
For subsets the instantiation is:
- state:
(path, start) - is_solution: every state is a solution, record
pathat every entry. - choices: every index
iin[start, n). - apply:
path.append(nums[i]). - undo:
path.pop(). - prune: none. Every subset is valid.
Subsets is the rare backtracking problem with no prune, the search space is exactly the output, so pruning would lose answers.
Step-by-step approach
- Sort
numsascending. - Define
dfs(start)carrying a globalpathlist. - Inside
dfs: append a copy ofpathto the output list. - Loop
ifromstartton - 1: pushnums[i], calldfs(i + 1), pop. - Call
dfs(0). Print the count2^n, then every recorded subset.
Worked example
nums = [1, 2, 3]
Discovery order, with path shown at the moment of recording:
[], empty.[1].[1, 2].[1, 2, 3].[1, 3].[2].[2, 3].[3].
Eight subsets, lex order, no duplicates. Check the expected output in the problem statement: it matches.
The bitmask alternative
n = len(nums)
out = []
for mask in range(1 << n):
out.append([nums[i] for i in range(n) if mask & (1 << i)])
out.sort()
Same complexity, two passes (enumerate, then sort), O(1) recursion depth. The mask 0b011 picks nums[0] and nums[1]. Useful when the problem also asks "for each subset, compute f(subset)" and you want a tight loop with no recursion overhead. When n <= 30 and the inner work is cheap, this is the variant that ships in performance-critical code.
Complexity
- Time: O(n * 2^n).
- Space: O(n * 2^n) for the output, O(n) for the recursion stack.
The output itself dominates; you cannot do asymptotically better.
Common pitfalls
- Appending
pathby reference: in Python,out.append(path)stores a pointer, and later mutations leak into the recorded subset. Alwaysout.append(list(path)). This is the single most common bug in generated backtracking code. - Forgetting the undo: if you push
nums[i]and recurse but forget to pop, sibling iterations see a polluted path and you emit garbage. - Wrong
start: passingstart + 1instead ofi + 1will skip elements; passingstartwill duplicate them. - Sorting after the fact: works for correctness but is wasteful. Sort the input once and the order is free.
Where this shows up in the enterprise
- LaunchDarkly / Optimizely: enumerate every feature-flag combination for regression testing before a release.
- AWS IAM / Okta: list every subset of permissions a role might grant, audit which are over-privileged.
- GitHub Actions / CircleCI: enumerate every subset of optional matrix dimensions a workflow could run with.
- Pharma / clinical trials: every subset of drug combinations for a phase-1 dose-finding study.
- Network testing: every subset of nodes to simulate as offline (chaos engineering at small scale).
- Compiler optimizers: every subset of optimization passes, used in autotuning pipelines like Halide.
- Pricing rules: every subset of discount stacks a cart engine like Shopify Plus must validate.
The power set is finite and small only when n <= ~20. Beyond that, the question shifts to "do we really need every subset, or only the ones meeting some constraint?", which is where pruning becomes essential and the problem morphs into combination-sum or subsets-with-constraints.
Variants you'll see elsewhere
subsets-ii: input has duplicates; skipnums[i] == nums[i-1]wheni > startto avoid emitting the same subset twice.combinations: enumerate only subsets of a fixed sizek. Samestartskeleton, base case islen(path) == k.combination-sum: subsets that sum to a target, with reuse allowed. Thestartis the same; the recursion usesiinstead ofi + 1.partition-equal-subset-sum: existence question. Drops backtracking entirely in favour of DP.
All four are the same DFS with a tweaked base case.
How parikshan turns this into a learning loop
The hints are graded. Hint 1 nudges you toward the binary choice (pick or skip). Hint 2 introduces the recursion shape. Hint 3 names the start index. Hint 4 hands you the full skeleton at a steep integrity cost. The dynamic private tests vary n from 0 to 10, so memorising one trace is useless, the order matters and the empty-input edge has to be handled. Run the problem in practice mode first with the AI assistant free of penalty, then redo it in exam mode where each AI request docks integrity. When you can write the backtracking skeleton in under three minutes without consulting the assistant, the pattern is in your fingers and you can move on to permutations or combination-sum.
In the AI-integrated workspace
An agent will write subsets for you in seconds. Your job is to spot three things in the generated code:
- The copy on record.
out.append(list(path)), notout.append(path). Generated code routinely forgets this and passes the first test (one subset is returned), then fails any test that asks for the full list. - The
startparameter on the recursion. Without it the agent enumerates permutations of subsets, not subsets, three times the work and the wrong answer. - The lex-order guarantee. If the agent writes bitmask enumeration and the spec asks for lex order, the agent will need to sort afterwards. Catch it in review or you ship sorted-by-binary output.
The senior-engineer move is to name the pattern out loud, "this is the power-set DFS with start", before reading a single line of generated code. That naming primes you to verify the three pitfalls instead of skimming the diff.