generate-parentheses

Difficulty: Medium · Topic: Stack · Practice it on parikshan: /practice/generate-parentheses/start

The story

A team at JetBrains writing the IntelliJ refactoring engine wants to enumerate all syntactically valid ways to wrap a sequence of n function calls in matched parentheses for a "rewrite to applicative form" suggestion. The IDE shows the user every valid grouping and asks them to pick. They give you n and want every well-formed string of n open and n close parens, in lexicographic order. The count for n = 4 is 14 (the fourth Catalan number); for n = 10 it is 16,796. Manageable for n up to 10.

That is generate-parentheses. It is the canonical "constrained generation" problem, and its solution is a backtracking recursion that uses the stack metaphor implicitly through the call stack and the matching invariant.

What's actually being asked

Given an integer n, output the count C_n of well-formed parentheses strings of n open and n close parens, followed by the strings themselves in lexicographic order.

C_n is the n-th Catalan number: 1, 2, 5, 14, 42, 132, ... for n = 1, 2, 3, 4, 5, 6, ....

The naive approach

Enumerate all 2^(2n) binary strings of length 2n, treat 0 as ( and 1 as ), and filter by validity. O(4^n) work and most candidates are invalid. For n = 10, that is 4^10 = 10^6 and 16,796 valid strings, passable, but the filter discards 98% of work.

The key insight

A partial string is viable (can be extended to a valid one) iff:

The number of ( used so far, o, satisfies o <= n.
The number of ) used so far, c, satisfies c <= o.

The second condition is the stack invariant: every prefix must have at least as many opens as closes. Equivalent: at every point you can think of a virtual stack of unmatched opens; the stack never goes negative, and at the end it must be exactly empty (which means o == c == n).

The recursion: from a partial string with o opens and c closes,

If o == n and c == n, emit the string.
If o < n, recurse with one more open: o+1, c.
If c < o, recurse with one more close: o, c+1.

The two branches together prune all invalid prefixes. The result is exactly C_n strings, in lexicographic order if you emit the open branch before the close branch (because ( is lexicographically less than )).

How the stack maps to the recursion

The "stack of unmatched opens" is implicit:

o - c is the current stack size.
o < n corresponds to "stack can grow" (push).
c < o corresponds to "stack is non-empty, can pop" (pop and match).

You never need a literal stack. The recursion's call stack is the stack of decisions; the invariant o >= c keeps the bracket stack non-negative at every step. This is the cleanest example of "the structure of the problem is the structure of the algorithm" in the stack chapter.

Step-by-step approach

results = [], path = [] (a char list, used as the in-progress string).
Define def gen(o, c):
- If len(path) == 2 * n: append ''.join(path) to results; return.
- If o < n: append '(' to path; call gen(o + 1, c); pop from path.
- If c < o: append ')' to path; call gen(o, c + 1); pop from path.
Call gen(0, 0).
Print len(results). Then print each string on its own line.

Because the open branch fires before the close branch and the recursion is depth-first, the strings emerge in lexicographic order naturally. No sort needed.

Worked example

n = 2

The recursion tree:

gen(0,0)               -- path: ""
  gen(1,0)             -- path: "("
    gen(2,0)           -- path: "(("
      gen(2,1)         -- path: "(()"
        gen(2,2) EMIT  -- "(())"
    gen(1,1)           -- path: "()"
      gen(2,1)         -- path: "()("
        gen(2,2) EMIT  -- "()()"

Output:

2
(())
()()

For n = 3, the five Catalan strings are: ((())), (()()), (())(), ()(()), ()()().

Complexity

Time: O(4^n / sqrt(n)), the size of the output, since each valid string is one unit of work and the Catalan numbers grow as ~4^n / (n * sqrt(n)).
Space: O(n) for the recursion stack (plus the output, which is part of the answer not the algorithm's working memory).

For n <= 10, this is fast: C_10 = 16,796.

Common pitfalls

Generating all 2^(2n) strings and filtering: works, but does 60x more work than backtracking on n = 10. Worse for larger n.
Using a literal stack in the recursion: redundant; the call stack and the o, c counters already encode the stack's state.
Emitting strings out of lexicographic order: if you recurse on ')' before '(', the order is wrong. Always recurse on '(' first.
Mutating a shared list without restoring on return: classic backtracking bug. Always pop after the recursive call.
Forgetting the path.append / path.pop pairing: leaves stale state in subsequent siblings.

Where this shows up in the enterprise

IDE refactoring suggestions (IntelliJ, VS Code, Eclipse): enumerating valid groupings during a "rewrite to ..." refactor.
Compiler test-case generation: building syntactically valid expressions of bounded size to stress-test parsers.
Property-based testing (Hypothesis, QuickCheck): shrinking generators for grouped expressions.
SAT solver test harnesses: generating valid CNF substructures.
DNA / RNA secondary-structure prediction: matching base pairs map to valid bracket strings; enumerating candidate folds.
Chemistry tools (RDKit): SMILES branch enumeration uses similar stack-balance generation.
Grammars for fuzzing (LLVM libFuzzer): structured input generation that respects nesting.

The generalisation is constrained tree-enumeration with a balance invariant. Wherever it shows up, backtracking with a "running balance" counter is the right tool.

Variants you'll see elsewhere

letter-combinations-phone: pure enumeration without a balance constraint.
subsets, permutations, combinations: same backtracking skeleton with different prune rules.
restore-ip-addresses: backtracking with a length-and-value constraint.
palindrome-partitioning: same skeleton, predicate is "is this prefix a palindrome?".
n-queens: backtracking with multiple parallel constraints (column, diagonals).

How parikshan turns this into a learning loop

The dynamic private tests on parikshan check:

n = 1 (one string: ()).
n = 2 (two strings; ordering matters).
n = 10 (the upper limit; tests that you do not over-generate and time out).
Stress on counting: the first line of output must be C_n, and miscounting is a common bug.
Strict lexicographic ordering: any out-of-order string fails the byte-exact check.

The integrity-scored AI sparring partner is the right place, in practice mode, to verbalise why the o >= c invariant suffices before exam mode. Once you can say "any prefix violating that invariant cannot be extended to a valid string", the recursion writes itself.

In the AI-integrated workspace

An agent generating parenthesis-enumeration code might use either backtracking or BFS over a state graph. Both work for small n. Your review:

Is the algorithm using two counters (open used, close used) or a literal stack? The counters are cleaner; ask the agent to refactor if it built a stack.
Is the recursion depth-first with open before close, guaranteeing lex order without a sort?
Are the counters incremented before the recursive call and decremented (or restored via slice) after?
Is the base case len(path) == 2 * n, or some incorrect proxy?

In production, code that enumerates a constrained language is rare but high-stakes (fuzzing, code generation, refactoring). The agent's correctness on this kind of code is brittle. A senior reviewer should be able to read the recursion and immediately spot the missing path.pop() or the swapped branch order. parikshan trains that reviewer.

generate-parentheses is the moment the stack metaphor meets backtracking. The stack is the shape of the problem; backtracking is the algorithm that explores valid shapes. Master both here and the rest of the backtracking canon follows.