sliding-window-maximum

Difficulty: Hard · Topic: Sliding Window · Practice it on parikshan: /practice/sliding-window-maximum/start

The story

At Citadel's market-making desk, a real-time risk system tracks the maximum bid for an instrument over the last 100 milliseconds. Every millisecond, a new bid arrives and the oldest expires. The desk needs the current 100-ms max instantly: it feeds into the spread the market-maker quotes back. The naive "scan the last 100 values" runs in 100x the tick rate, which the desk cannot afford. They need an O(1) amortised per-tick algorithm.

That is sliding-window-maximum. The data structure that solves it is the monotonic deque, and once you learn it here, you spot it in stream-processing systems for the rest of your career.

What's actually being asked

Given an integer array nums and an integer k, output the maximum of every contiguous subarray of size k from left to right. There are n - k + 1 such windows; print their maxima in order.

The naive approach

For each of the n - k + 1 windows, scan all k elements for the max. O(n * k). For n = 10^5 and k = 5 * 10^4, that is 5 billion. Times out.

The first improvement: priority queue

A max-heap holding the window's elements gives O(log k) per push and per pop. Total: O(n log k). Better, but with a subtle gotcha: when an element leaves the window, you cannot remove it from the middle of the heap in O(log k) without an index. The standard fix is "lazy deletion": store (value, index) in the heap and, when peeking the top, discard entries whose index is outside the window. The amortised cost is still O(log k) per operation, total O(n log k).

For n = 10^5, n log k is ~ 1.7 million operations. Fine. But there is something better.

The key insight

Maintain a deque of indices with two invariants:

The deque indices are in increasing order left to right.
The values at those indices, in deque order, are strictly decreasing.

Why this works:

The front of the deque is the index of the current window's maximum. It is the leftmost of the decreasing sequence, so it is the largest.
When you add a new index i to the back: pop indices from the back whose values are <= nums[i], because those values can never again be the max (a larger value is in the window now and persists at least as long as they would). Then push i.
When the front index falls out of the window (front index < i - k + 1), pop it from the front.

Each index is pushed once and popped at most once. Total work: O(n). Each window's max is nums[deque.front()], queried in O(1).

This is the canonical monotonic-deque pattern. It generalises to "min in window" (flip the comparison), "k-th largest" (harder), and any "extremum-in-sliding-window" problem.

Why the deque beats the heap

The heap is O(n log k), the deque is O(n). The asymptotic gap is small for moderate k, but the constant factor is dramatic: the deque does only one comparison per element on average and never allocates beyond what the deque holds, whereas the heap does log k comparisons and pointer chases per operation. In a market-making system processing millions of ticks per second, the deque wins decisively.

The deeper insight: the heap maintains a full ordering, which is more than you need. You only need to know what could become the future max. The decreasing-by-value invariant captures exactly that: elements smaller than a later element will never be the max again. The deque discards them. The heap remembers them and pays for the bookkeeping.

Step-by-step approach

dq = empty deque of indices. out = list of window maxima.
For i from 0 to n - 1:
- Pop front if expired: while dq is non-empty and dq.front() < i - k + 1, pop the front.
- Pop back while smaller: while dq is non-empty and nums[dq.back()] <= nums[i], pop the back.
- Push i to the back.
- If i >= k - 1, append nums[dq.front()] to out.
Print out space-separated.

Worked example

nums = [1, 3, -1, -3, 5, 3, 6, 7], k = 3

i	nums[i]	deque before	pop front (expired)	pop back (<=)	deque after push	window max (if i>=2)
0	1	[]	-	-	[0]	-
1	3	[0]	-	pop 0 (1<=3)	[1]	-
2	-1	[1]	-	(-1 < 3)	[1, 2]	nums[1] = 3
3	-3	[1, 2]	-	(-3 < -1)	[1, 2, 3]	nums[1] = 3
4	5	[1, 2, 3]	pop 1 (1 < 4-3+1=2)? No, 1<2 yes, pop	pop 3 (-3<=5), pop 2 (-1<=5)	[4]	nums[4] = 5
5	3	[4]	-	(3 < 5)	[4, 5]	nums[4] = 5
6	6	[4, 5]	-	pop 5 (3<=6), pop 4 (5<=6)	[6]	nums[6] = 6
7	7	[6]	-	pop 6 (6<=7)	[7]	nums[7] = 7

Output: 3 3 5 5 6 7.

Complexity

Time: O(n). Each index is pushed once and popped at most once.
Space: O(k) for the deque (it holds at most k indices).

Compare:

Brute force: O(n * k).
Heap with lazy deletion: O(n log k).
Monotonic deque: O(n).

For k = 1, all three are the same. For k = n, brute force is O(n^2), heap is O(n log n), deque is O(n).

Common pitfalls

Storing values instead of indices in the deque: you cannot tell when an element has expired without its index. Always store indices.
Comparing with < instead of <= on the back-pop: with <, equal values both stay in the deque, which is correct but wasteful. With <=, you keep only the leftmost (which expires soonest), which is the convention. Both yield correct maxima.
Forgetting to pop expired front before reading: the front index might be outside the window. Always expire-check first.
Recording the window max before the window is full: only emit output when i >= k - 1.
Reaching for a heap because it is the "obvious" choice: it works but is asymptotically and constant-factor slower. The deque is the right answer; in interviews, naming it earns serious credit.

Where this shows up in the enterprise

Citadel / DRW / Jane Street / HRT market-making: real-time max/min over a tick window.
Cloudflare DDoS attack scoring: rolling-window max of inbound packet rate per source.
Datadog / New Relic rolling max of latency, used for p100/SLO calculations.
AWS / GCP autoscalers: max CPU over the last N seconds drives scale-out decisions.
Netflix / YouTube CDN: max throughput in the last window decides bitrate ladder.
Algorithmic trading at Two Sigma / Renaissance: rolling-window max of volatility for adaptive position sizing.
Robotics / control systems: max sensor reading in a sliding time window for fault detection.
NVIDIA / Intel performance counters: max GPU util in the last k samples for thermal throttle decisions.

Anywhere "max/min over the last k events" matters in a hot loop, this algorithm is the right one.

Variants you'll see elsewhere

sliding-window-minimum: flip the comparison; same structure.
find-median-from-data-stream: harder, two heaps.
largest-rectangle-in-histogram: same monotonic stack mindset, different problem geometry.
shortest-subarray-with-sum-at-least-k: monotonic deque on prefix sums.
constrained-subsequence-sum: DP value + sliding-window max, beautiful composition.

How parikshan turns this into a learning loop

The dynamic private tests on parikshan probe the corners:

k = 1 (every element is its own max; tests that the deque does not over-eagerly pop).
k = n (one window, the global max; tests the "expire front" path is correctly not taken).
Strictly increasing array (deque always has size 1, since every new element pops the previous).
Strictly decreasing array (deque grows to size k and shifts as the front expires).
Equal values (tests < versus <= on back-pop).
Large n near the constraint cap (proves you implemented the deque and not the n * k brute force).

The parikshan integrity-scored AI sparring partner is the right place, in practice mode, to debate "why the deque versus the heap" before exam mode. Articulating the two-invariant argument out loud is what cements the pattern.

In the AI-integrated workspace

The biggest agent failure on this problem: the agent reaches for a heap. It works on small tests, passes the sample, and looks correct. Your code review:

Is it O(n) or O(n log k)? Ask for the complexity claim; check it matches the implementation.
Are you storing indices in the deque, not values?
Are you expiring the front before you read the max?
Is the back-pop using <= (or <), and does the choice match what the spec asks for on ties?

A senior engineer reading agent-generated stream-max code should be able to say in one line: "Switch this from a heap to a monotonic deque; the deque is O(n) and avoids the lazy-deletion bookkeeping." That one-line refactor is the difference between an algorithm that fits in a 10-microsecond tick budget and one that doesn't.

sliding-window-maximum is the bridge from "sliding window with hash maps" to "sliding window with monotonic structures". Mastering the deque here unlocks a whole tier of subsequent problems. Treat the parikshan practice run on this problem as the moment that bridge gets built.