parikshan

minimum-window-substring

Difficulty: Hard  ·  Topic: Sliding Window  ·  Practice it on parikshan: /practice/minimum-window-substring/start

The story

A compliance engineer at a healthcare SaaS audits patient-record exports. Every export must contain a required set of identifiers (e.g., NPI number, DOB, MRN) within some contiguous chunk for the auditor's eye. The auditor wants the smallest contiguous excerpt of the export that still contains every required identifier. Smaller excerpts are easier to review; ideally the auditor reads three lines rather than the whole file. The required identifiers may appear with multiplicities (e.g., "MRN appears at least twice").

That is minimum-window-substring. It is the canonical hard sliding-window problem and the one that, in interviews, sorts the engineers who know the pattern from the ones who only know the easy variants.

What's actually being asked

Given strings s and t, return the smallest substring of s that contains every character of t, counting multiplicities. If multiple smallest windows tie, return the leftmost. If no window exists, return an empty string. Case-sensitive.

"Contains every character of t" means: for each distinct character c, the window's count of c is at least t's count of c. If t = "AABC", the window must have at least two As, one B, one C.

The naive approach

Three nested loops over start, end, and character match check. O(|s|^3) or O(|s|^2 * |t|). For |s| = 10^5, fatal.

A slightly smarter brute force: for each start, expand the window until it is valid, recording length. Still O(|s|^2) worst case.

The key insight

A variable-size sliding window with two counters. This is where the famous "need and have" state lives, and where most candidates trip in interviews.

  • need[c] = how many of character c are still needed for validity, counted only across letters that appear in t.
  • have = a single integer counter, the number of distinct characters of t whose multiplicity requirement is already met inside the current window.
  • required = the number of distinct characters in t.

The window is valid iff have == required. That single integer comparison is the per-slide validity check; it is O(1), and that is the entire reason the algorithm is O(|s|).

When you add a character c to the right of the window:

  • If c is not in t, ignore it (it does not affect need, have, or validity).
  • Else decrement need[c]. If need[c] is now exactly 0, have += 1. (It can go negative if there are extra cs, that is fine; only the zero-crossing flips have.)

When you remove a character c from the left of the window (during shrink):

  • If c is not in t, ignore it.
  • Else, if need[c] is currently 0, then removing this c will make the window deficient: have -= 1. Then increment need[c].

The order matters: check need[c] == 0 before the increment when removing; check need[c] == 0 after the decrement when adding.

The outer loop walks right from 0 to |s| - 1; for each, after the add, while have == required, record the current window if it beats the best, then shrink from the left.

Step-by-step approach

  1. If |t| > |s|, return empty.
  2. Build need[c] from t. required = number of distinct keys in need. have = 0. left = 0. best = (length=infinity, start=0).
  3. For right from 0 to |s| - 1:
    • c = s[right]. If c is in need, decrement need[c]. If need[c] == 0, have += 1.
    • While have == required:
      • If right - left + 1 < best.length, update best = (right - left + 1, left).
      • d = s[left]. If d is in need: if need[d] == 0, have -= 1; then need[d] += 1.
      • left += 1.
  4. If best.length is infinity, return empty. Else return s[best.start : best.start + best.length].

Worked example

s = "ADOBECODEBANC"
t = "ABC"

Index map: A@0, D@1, O@2, B@3, E@4, C@5, O@6, D@7, E@8, B@9, A@10, N@11, C@12.

Initial state: need = {A:1, B:1, C:1}, required = 3, have = 0, left = 0, best = (length=infinity, start=0).

Outer-loop add table. "flip" is "did have change after this add?".

rightcharaction on needneed afterhaveflip
0Aneed[A]: 1 -> 0{A:0,B:1,C:1}1+1
1Dnot in need, ignoreunchanged1-
2Onot in need, ignoreunchanged1-
3Bneed[B]: 1 -> 0{A:0,B:0,C:1}2+1
4Eignoreunchanged2-
5Cneed[C]: 1 -> 0{A:0,B:0,C:0}3+1 valid

have == required for the first time at right = 5. Enter the shrink loop. Each row: "record window if strictly smaller than best, then drop left." len = right - left + 1.

lefts[left]windowlensmaller?best afterdrop actionhave afterexit?
0A"ADOBEC"6yes (inf -> 6)(6, 0)need[A]==0, have 3 -> 2, need[A]: 0 -> 1, left=12yes

Continue the outer loop from right = 6:

rightcharaction on needneed afterhaveflip
6Oignoreunchanged2-
7Dignoreunchanged2-
8Eignoreunchanged2-
9Bneed[B]: 0 -> -1 (no zero crossing){A:1,B:-1,C:0}2-
10Aneed[A]: 1 -> 0{A:0,B:-1,C:0}3+1 valid

Enter shrink with left = 1, right = 10:

lefts[left]windowlensmaller?best afterdrop actionhave afterexit?
1D"DOBECODEBA"10no (6)(6, 0)ignore, left=23no
2O"OBECODEBA"9no(6, 0)ignore, left=33no
3B"BECODEBA"8no(6, 0)need[B]==-1, have unchanged, need[B]: -1 -> 0, left=43no
4E"ECODEBA"7no(6, 0)ignore, left=53no
5C"CODEBA"6tied (leftmost wins, no update)(6, 0)need[C]==0, have 3 -> 2, need[C]: 0 -> 1, left=62yes

Note the key subtlety at left = 3: dropping the first B leaves the second B still in the window (we already saw it at right = 9), so have must not drop. The need[B] == -1 check before the increment is exactly the predicate that protects this.

Continue the outer loop:

rightcharaction on needneed afterhaveflip
11Nignoreunchanged2-
12Cneed[C]: 1 -> 0{A:1,B:0,C:0}3+1 valid

Enter shrink with left = 6, right = 12:

lefts[left]windowlensmaller?best afterdrop actionhave afterexit?
6O"ODEBANC"7no(6, 0)ignore, left=73no
7D"DEBANC"6tied(6, 0)ignore, left=83no
8E"EBANC"5yes (6 -> 5)(5, 8)ignore, left=93no
9B"BANC"4yes (5 -> 4)(4, 9)need[B]==0, have 3 -> 2, need[B]: 0 -> 1, left=102yes

Outer loop terminates (right has walked to the end). Final answer: s[9..12] = "BANC", length 4. That matches the expected output.

The trace earns its keep at exactly two moments: the second-B drop at left = 3 (where need[B] == -1 keeps have pinned at 3), and the final shrink at right = 12 where the window shrinks from 7 to 4 inside a single outer-loop iteration. Both are the reason the algorithm is O(|s|) rather than O(|s| * |t|).

Complexity

  • Time: O(|s| + |t|). Each of left and right moves at most |s| times.
  • Space: O(|t|) for the need map. Effectively O(1) for fixed alphabets.

The pointer-amortisation is the same as in the easy sliding-window problems. What is different here is the have/need state, which is the interview-killer: candidates write either an O(|s| * |t|) re-validation per slide, or an unmaintained counter that drifts, or both. Get the counter right and the rest follows.

Common pitfalls

  • Recomputing window validity from scratch each iteration: turns the algorithm into O(|s| * |t|). Times out.
  • Updating have on every change to need[c], not only on zero-crossings: have is supposed to count "satisfied distinct characters", which flips only when need[c] crosses zero (transitions 1 -> 0 adds one to have; transitions 0 -> 1 subtracts one).
  • Forgetting characters that are not in t: they do not affect need, have, or validity, but they still slide through the window. Skip the bookkeeping for them.
  • Tie-breaking on smallest length: the spec says leftmost on ties. Update best only on strictly smaller length, not "less than or equal".
  • Returning a default like "" when a window exists: be careful with the "no window exists" sentinel.

Where this shows up in the enterprise

  • Healthcare audit tools (Epic, Cerner, Athena): minimum-excerpt extraction for compliance reviewers.
  • Legal e-discovery (Relativity, Logikcull): minimum-passage extraction containing all of a set of search terms.
  • Splunk / Elasticsearch / Datadog log search: smallest contiguous log slice covering all required event types.
  • Google Search snippet generation: smallest passage containing all query terms.
  • GitHub code search: smallest contiguous code region containing all queried tokens.
  • Stripe Radar fraud-rule editor: minimum-event window covering a configured set of signals.
  • Genomics (BLAST, Bowtie2): smallest read-span containing every diagnostic motif.
  • News aggregation (Bloomberg, Reuters): minimum-paragraph extraction containing all subject keywords.

The shape: "a covering excerpt that is as small as possible." Sliding window with need/have is the answer in every case.

Variants you'll see elsewhere

  • substring-with-concatenation-of-all-words: like this but the alphabet is a set of words.
  • permutation-in-string: fixed window, multiset equality not coverage.
  • minimum-window-subsequence: subsequence (not contiguous) variant, requires DP.
  • smallest-range-covering-elements-from-k-lists: same idea, k separate streams.
  • find-all-anagrams-in-a-string: enumerate all anagram-windows.

How parikshan turns this into a learning loop

The dynamic private tests on parikshan are merciless on this problem:

  • t with duplicate characters (AABC requires two A's; counter must handle multiplicities).
  • t longer than s (return empty).
  • t == s (return s).
  • Multiple windows tied on length (return leftmost).
  • s contains characters not in t (must skip them in bookkeeping but include them in slide).
  • Case-sensitive: a != A.

The integrity-scored AI sparring partner is the right place, in practice mode, to verbalise the need/have invariant before writing the code. If you cannot say it in one English sentence ("have is the number of distinct characters in t whose multiplicity in the window is at least the required count, and the window is valid iff have == required") then writing the code first will produce subtle bugs.

In the AI-integrated workspace

Most agent-generated solutions to this problem are wrong on duplicate-character tests because the agent uses have == len(t) or have == |set(t)| inconsistently. Your review checks:

  1. Is required the count of distinct characters of t, not the length of t?
  2. Is have updated only on zero-crossings of need[c], not on every change?
  3. Is the shrink loop's exit condition have < required (equivalent: have != required after a successful drop)?
  4. Is the tie-break on length strict, so the leftmost minimum wins?

This is also the problem where you must verify the agent ran a duplicate-character test. If the only test was t = "ABC", the agent's bug is invisible. Add t = "AABC" to every code review for this pattern. parikshan's dynamic tests do this for you in practice; in production, you do it by hand.

minimum-window-substring is the moment a candidate moves from "I have written a sliding window" to "I understand sliding windows". Treat the practice session here as the gating moment of the cluster.