minimum-window-substring
Difficulty: Hard · Topic: Sliding Window · Practice it on parikshan: /practice/minimum-window-substring/start
The story
A compliance engineer at a healthcare SaaS audits patient-record exports. Every export must contain a required set of identifiers (e.g., NPI number, DOB, MRN) within some contiguous chunk for the auditor's eye. The auditor wants the smallest contiguous excerpt of the export that still contains every required identifier. Smaller excerpts are easier to review; ideally the auditor reads three lines rather than the whole file. The required identifiers may appear with multiplicities (e.g., "MRN appears at least twice").
That is minimum-window-substring. It is the canonical hard sliding-window problem and the one that, in interviews, sorts the engineers who know the pattern from the ones who only know the easy variants.
What's actually being asked
Given strings s and t, return the smallest substring of s that contains every character of t, counting multiplicities. If multiple smallest windows tie, return the leftmost. If no window exists, return an empty string. Case-sensitive.
"Contains every character of t" means: for each distinct character c, the window's count of c is at least t's count of c. If t = "AABC", the window must have at least two As, one B, one C.
The naive approach
Three nested loops over start, end, and character match check. O(|s|^3) or O(|s|^2 * |t|). For |s| = 10^5, fatal.
A slightly smarter brute force: for each start, expand the window until it is valid, recording length. Still O(|s|^2) worst case.
The key insight
A variable-size sliding window with two counters. This is where the famous "need and have" state lives, and where most candidates trip in interviews.
need[c]= how many of charactercare still needed for validity, counted only across letters that appear int.have= a single integer counter, the number of distinct characters oftwhose multiplicity requirement is already met inside the current window.required= the number of distinct characters int.
The window is valid iff have == required. That single integer comparison is the per-slide validity check; it is O(1), and that is the entire reason the algorithm is O(|s|).
When you add a character c to the right of the window:
- If
cis not int, ignore it (it does not affectneed,have, or validity). - Else decrement
need[c]. Ifneed[c]is now exactly 0,have += 1. (It can go negative if there are extracs, that is fine; only the zero-crossing flipshave.)
When you remove a character c from the left of the window (during shrink):
- If
cis not int, ignore it. - Else, if
need[c]is currently 0, then removing thiscwill make the window deficient:have -= 1. Then incrementneed[c].
The order matters: check need[c] == 0 before the increment when removing; check need[c] == 0 after the decrement when adding.
The outer loop walks right from 0 to |s| - 1; for each, after the add, while have == required, record the current window if it beats the best, then shrink from the left.
Step-by-step approach
- If
|t| > |s|, return empty. - Build
need[c]fromt.required = number of distinct keys in need.have = 0.left = 0.best = (length=infinity, start=0). - For
rightfrom 0 to|s| - 1:c = s[right]. Ifcis inneed, decrementneed[c]. Ifneed[c] == 0,have += 1.- While
have == required:- If
right - left + 1 < best.length, updatebest = (right - left + 1, left). d = s[left]. Ifdis inneed: ifneed[d] == 0,have -= 1; thenneed[d] += 1.left += 1.
- If
- If
best.lengthis infinity, return empty. Else returns[best.start : best.start + best.length].
Worked example
s = "ADOBECODEBANC"
t = "ABC"
Index map: A@0, D@1, O@2, B@3, E@4, C@5, O@6, D@7, E@8, B@9, A@10, N@11, C@12.
Initial state: need = {A:1, B:1, C:1}, required = 3, have = 0, left = 0, best = (length=infinity, start=0).
Outer-loop add table. "flip" is "did have change after this add?".
| right | char | action on need | need after | have | flip |
|---|---|---|---|---|---|
| 0 | A | need[A]: 1 -> 0 | {A:0,B:1,C:1} | 1 | +1 |
| 1 | D | not in need, ignore | unchanged | 1 | - |
| 2 | O | not in need, ignore | unchanged | 1 | - |
| 3 | B | need[B]: 1 -> 0 | {A:0,B:0,C:1} | 2 | +1 |
| 4 | E | ignore | unchanged | 2 | - |
| 5 | C | need[C]: 1 -> 0 | {A:0,B:0,C:0} | 3 | +1 valid |
have == required for the first time at right = 5. Enter the shrink loop. Each row: "record window if strictly smaller than best, then drop left." len = right - left + 1.
| left | s[left] | window | len | smaller? | best after | drop action | have after | exit? |
|---|---|---|---|---|---|---|---|---|
| 0 | A | "ADOBEC" | 6 | yes (inf -> 6) | (6, 0) | need[A]==0, have 3 -> 2, need[A]: 0 -> 1, left=1 | 2 | yes |
Continue the outer loop from right = 6:
| right | char | action on need | need after | have | flip |
|---|---|---|---|---|---|
| 6 | O | ignore | unchanged | 2 | - |
| 7 | D | ignore | unchanged | 2 | - |
| 8 | E | ignore | unchanged | 2 | - |
| 9 | B | need[B]: 0 -> -1 (no zero crossing) | {A:1,B:-1,C:0} | 2 | - |
| 10 | A | need[A]: 1 -> 0 | {A:0,B:-1,C:0} | 3 | +1 valid |
Enter shrink with left = 1, right = 10:
| left | s[left] | window | len | smaller? | best after | drop action | have after | exit? |
|---|---|---|---|---|---|---|---|---|
| 1 | D | "DOBECODEBA" | 10 | no (6) | (6, 0) | ignore, left=2 | 3 | no |
| 2 | O | "OBECODEBA" | 9 | no | (6, 0) | ignore, left=3 | 3 | no |
| 3 | B | "BECODEBA" | 8 | no | (6, 0) | need[B]==-1, have unchanged, need[B]: -1 -> 0, left=4 | 3 | no |
| 4 | E | "ECODEBA" | 7 | no | (6, 0) | ignore, left=5 | 3 | no |
| 5 | C | "CODEBA" | 6 | tied (leftmost wins, no update) | (6, 0) | need[C]==0, have 3 -> 2, need[C]: 0 -> 1, left=6 | 2 | yes |
Note the key subtlety at left = 3: dropping the first B leaves the second B still in the window (we already saw it at right = 9), so have must not drop. The need[B] == -1 check before the increment is exactly the predicate that protects this.
Continue the outer loop:
| right | char | action on need | need after | have | flip |
|---|---|---|---|---|---|
| 11 | N | ignore | unchanged | 2 | - |
| 12 | C | need[C]: 1 -> 0 | {A:1,B:0,C:0} | 3 | +1 valid |
Enter shrink with left = 6, right = 12:
| left | s[left] | window | len | smaller? | best after | drop action | have after | exit? |
|---|---|---|---|---|---|---|---|---|
| 6 | O | "ODEBANC" | 7 | no | (6, 0) | ignore, left=7 | 3 | no |
| 7 | D | "DEBANC" | 6 | tied | (6, 0) | ignore, left=8 | 3 | no |
| 8 | E | "EBANC" | 5 | yes (6 -> 5) | (5, 8) | ignore, left=9 | 3 | no |
| 9 | B | "BANC" | 4 | yes (5 -> 4) | (4, 9) | need[B]==0, have 3 -> 2, need[B]: 0 -> 1, left=10 | 2 | yes |
Outer loop terminates (right has walked to the end). Final answer: s[9..12] = "BANC", length 4. That matches the expected output.
The trace earns its keep at exactly two moments: the second-B drop at left = 3 (where need[B] == -1 keeps have pinned at 3), and the final shrink at right = 12 where the window shrinks from 7 to 4 inside a single outer-loop iteration. Both are the reason the algorithm is O(|s|) rather than O(|s| * |t|).
Complexity
- Time: O(|s| + |t|). Each of
leftandrightmoves at most|s|times. - Space: O(|t|) for the
needmap. EffectivelyO(1)for fixed alphabets.
The pointer-amortisation is the same as in the easy sliding-window problems. What is different here is the have/need state, which is the interview-killer: candidates write either an O(|s| * |t|) re-validation per slide, or an unmaintained counter that drifts, or both. Get the counter right and the rest follows.
Common pitfalls
- Recomputing window validity from scratch each iteration: turns the algorithm into
O(|s| * |t|). Times out. - Updating
haveon every change toneed[c], not only on zero-crossings:haveis supposed to count "satisfied distinct characters", which flips only whenneed[c]crosses zero (transitions1 -> 0adds one to have; transitions0 -> 1subtracts one). - Forgetting characters that are not in
t: they do not affectneed,have, or validity, but they still slide through the window. Skip the bookkeeping for them. - Tie-breaking on smallest length: the spec says leftmost on ties. Update best only on strictly smaller length, not "less than or equal".
- Returning a default like
""when a window exists: be careful with the "no window exists" sentinel.
Where this shows up in the enterprise
- Healthcare audit tools (Epic, Cerner, Athena): minimum-excerpt extraction for compliance reviewers.
- Legal e-discovery (Relativity, Logikcull): minimum-passage extraction containing all of a set of search terms.
- Splunk / Elasticsearch / Datadog log search: smallest contiguous log slice covering all required event types.
- Google Search snippet generation: smallest passage containing all query terms.
- GitHub code search: smallest contiguous code region containing all queried tokens.
- Stripe Radar fraud-rule editor: minimum-event window covering a configured set of signals.
- Genomics (BLAST, Bowtie2): smallest read-span containing every diagnostic motif.
- News aggregation (Bloomberg, Reuters): minimum-paragraph extraction containing all subject keywords.
The shape: "a covering excerpt that is as small as possible." Sliding window with need/have is the answer in every case.
Variants you'll see elsewhere
substring-with-concatenation-of-all-words: like this but the alphabet is a set of words.permutation-in-string: fixed window, multiset equality not coverage.minimum-window-subsequence: subsequence (not contiguous) variant, requires DP.smallest-range-covering-elements-from-k-lists: same idea, k separate streams.find-all-anagrams-in-a-string: enumerate all anagram-windows.
How parikshan turns this into a learning loop
The dynamic private tests on parikshan are merciless on this problem:
twith duplicate characters (AABCrequires two A's; counter must handle multiplicities).tlonger thans(return empty).t == s(returns).- Multiple windows tied on length (return leftmost).
scontains characters not int(must skip them in bookkeeping but include them in slide).- Case-sensitive:
a != A.
The integrity-scored AI sparring partner is the right place, in practice mode, to verbalise the need/have invariant before writing the code. If you cannot say it in one English sentence ("have is the number of distinct characters in t whose multiplicity in the window is at least the required count, and the window is valid iff have == required") then writing the code first will produce subtle bugs.
In the AI-integrated workspace
Most agent-generated solutions to this problem are wrong on duplicate-character tests because the agent uses have == len(t) or have == |set(t)| inconsistently. Your review checks:
- Is
requiredthe count of distinct characters oft, not the length oft? - Is
haveupdated only on zero-crossings ofneed[c], not on every change? - Is the shrink loop's exit condition
have < required(equivalent:have != requiredafter a successful drop)? - Is the tie-break on length strict, so the leftmost minimum wins?
This is also the problem where you must verify the agent ran a duplicate-character test. If the only test was t = "ABC", the agent's bug is invisible. Add t = "AABC" to every code review for this pattern. parikshan's dynamic tests do this for you in practice; in production, you do it by hand.
minimum-window-substring is the moment a candidate moves from "I have written a sliding window" to "I understand sliding windows". Treat the practice session here as the gating moment of the cluster.