parikshan

merge-intervals

Difficulty: Medium  ·  Topic: Intervals  ·  Practice it on parikshan: /practice/merge-intervals/start

The story

The billing team at Stripe is prorating a subscription. A customer signed up on the 5th, upgraded on the 12th, downgraded on the 20th, and cancelled on the 28th. Each plan window is a closed interval of days. To compute the right monthly charge, the billing engine first merges every pair of overlapping or touching subscription windows into a single window so it can apply the correct unit rate to each. The same engine runs at Adyen, Square, and Razorpay. Every SaaS billing system, every PTO calendar at Workday, every conflict resolver in Google Calendar, every concurrent-execution counter at AWS Lambda: they all start with the same primitive operation, merge intervals.

This is the canonical interval problem. The technique is one line of English: sort by start, then sweep. Once you have it, every other interval problem reduces to a tiny variation on this sweep.

What's actually being asked

You are given n closed intervals [s_i, e_i]. Merge every group of overlapping or touching intervals into a single interval. Output the result sorted by start. Two intervals [a, b] and [c, d] overlap or touch when c <= b (assuming a <= c).

Examples:

  • [[1,3], [2,6], [8,10], [15,18]][[1,6], [8,10], [15,18]]. The first two merge; the rest stand alone.
  • [[1,4], [4,5]][[1,5]]. Touching counts as merging (one ends at 4, the next starts at 4).
  • [][]. No intervals, no output.
  • [[1,4], [0,4]][[0,4]]. Same merged window regardless of input order; the sort handles it.

The naive approach

Compare every pair. If two intervals overlap, replace them with their union and restart. Repeat until no pair overlaps.

def merge_naive(intervals):
    changed = True
    while changed:
        changed = False
        for i in range(len(intervals)):
            for j in range(i + 1, len(intervals)):
                a, b = intervals[i], intervals[j]
                if a[1] >= b[0] and b[1] >= a[0]:
                    intervals[i] = [min(a[0], b[0]), max(a[1], b[1])]
                    intervals.pop(j)
                    changed = True
                    break
            if changed: break
    return sorted(intervals)

Correct, O(n^3) in the worst case. For n = 10^5 this is 10^15 operations, astronomically too slow. There is a much better algorithm.

The key insight

If the intervals are sorted by start, then two intervals belong to the same merged block if and only if every adjacent pair in sorted order is overlapping or touching. Why? Suppose A, M, B are three intervals in sorted order with A.start ≤ M.start ≤ B.start. If A overlaps B (so B.start ≤ A.end), then M.start ≤ B.start ≤ A.end, so M overlaps A too. The transitive structure means a single left-to-right sweep catches every merge.

So the algorithm is:

  1. Sort the intervals by start.
  2. Sweep. Hold a "current merged interval" (cs, ce). For each next (s, e):
    • If s <= ce, the new interval overlaps or touches; extend ce = max(ce, e).
    • Otherwise, the gap is real; commit (cs, ce) to the output and reset (cs, ce) = (s, e).
  3. Don't forget the final commit of the last open interval after the loop.

That is the entire problem. Sort once, sweep once.

Step-by-step approach

def merge(intervals):
    if not intervals: return []
    intervals.sort(key=lambda x: x[0])
    cs, ce = intervals[0]
    out = []
    for s, e in intervals[1:]:
        if s <= ce:
            if e > ce: ce = e
        else:
            out.append((cs, ce))
            cs, ce = s, e
    out.append((cs, ce))
    return out

Watch the <= carefully; it is what makes touching merge. If the spec said "touching does not merge" you would write s < ce instead. That one character is the whole spec.

Worked example

intervals = [[1,3], [2,6], [8,10], [15,18]].

After sort (already sorted): same order.

Sweep:

  • Start with (cs, ce) = (1, 3).
  • Next (2, 6): 2 <= 3, overlap. Extend ce = max(3, 6) = 6. State: (1, 6).
  • Next (8, 10): 8 > 6, gap. Commit (1, 6). Reset (cs, ce) = (8, 10).
  • Next (15, 18): 15 > 10, gap. Commit (8, 10). Reset (cs, ce) = (15, 18).
  • Loop ends. Commit (15, 18).

Output: [(1, 6), (8, 10), (15, 18)]. ✓

Touching case: [[1,4], [4,5]]. After sort: same. Start (1, 4). Next (4, 5): 4 <= 4, extend ce = 5. State (1, 5). Loop ends. Commit (1, 5). ✓

Complexity

  • Time: O(n log n) for the sort; the sweep is O(n). Sort dominates.
  • Space: O(n) for the output list. The sort is typically in-place in Python (TimSort uses O(n) auxiliary in the worst case but practically much less).

For n = 10^5 this is around 1.7 million comparisons for the sort plus 100k sweep steps, well under one second in Python.

Common pitfalls

  • Forgetting to sort. The sweep correctness rests entirely on sorted-by-start input. With unsorted input you will miss merges and commit duplicate windows.
  • Wrong comparison for touching. The spec for this problem says touching merges, so the condition is s <= ce. Half the interval problems in the wild treat touching as non-overlapping; misreading the spec is the single most common bug.
  • Forgetting the final commit. The loop body commits when a gap is found, but the last open interval has nothing after it to trigger a gap. Append (cs, ce) once more after the loop.
  • Sorting by end time instead of start time. That works for some interval problems (notably interval-scheduling-maximization) but not this one. Sort by start.
  • Mutating the input list during iteration. Build a fresh output list; don't pop from the input mid-loop.
  • Empty input. Return [] without entering the loop. The reference solution prints nothing; be sure your code does too.

Where this shows up in the enterprise

  • Google Calendar and Microsoft Outlook: merging overlapping busy blocks before showing the "find a time" view to the user.
  • Stripe billing and Adyen subscription proration: collapsing overlapping plan windows into a single chargeable interval.
  • AWS Lambda concurrent-execution counting: when reporting customer concurrency over a window, merge overlapping invocation lifetimes.
  • Uber surge pricing: collapsing overlapping high-demand windows in a geofence into one surge band.
  • PagerDuty oncall scheduling: detecting overlapping shifts before publishing the schedule to slack and email.
  • Hotel and airline reservations at Booking.com, Expedia, Marriott: merging holds and confirmed reservations on the same room or seat.
  • CDN cache invalidation at Cloudflare and Fastly: merging overlapping purge windows so the origin only sees one purge request.
  • Compliance audit log analysis in fintech and healthcare: merging overlapping audit-relevant event windows to compute exposure intervals.
  • VLSI design at Synopsys and Cadence: merging overlapping signal-active windows during static timing analysis.

The pattern is universal: any time you have a list of [start, end] ranges and need a "collapse overlapping ones into single ranges" answer, you want this sweep.

Variants you'll see elsewhere

  • meeting-rooms: detect whether any pair of intervals overlaps. The same sort, a simpler sweep that returns false on the first conflict.
  • meeting-rooms-ii: count the minimum number of rooms needed for all meetings. The events sweep (start events as +1, end events as -1, sorted, running max).
  • insert-interval: insert one new interval into an already-sorted list. Three phases: copy non-overlapping prefix, merge the overlap region, copy the rest. The sweep is the merge phase.
  • employee-free-time: invert the merged intervals to get gaps. Merge first, then walk the gaps.
  • non-overlapping-intervals: remove the minimum number of intervals so the rest are non-overlapping. Sort by end time (different key) and use a greedy interval-scheduling algorithm.
  • my-calendar-iii: maintain a streaming version where each insert returns the current peak concurrency. Use a TreeMap of deltas.

Once you have merge-intervals, every other interval problem either reuses this sweep or replaces it with the events sweep.

How parikshan turns this into a learning loop

This is the entry point to the intervals cluster, and the integrated loop is built to drill the sort-then-sweep pattern until it is automatic.

  1. Read this article. The two key sentences are "sort by start" and "extend if s <= ce, else commit".
  2. Solve in practice mode with the AI chat off. Before writing code, draw the four-interval example on a number line and trace the sweep by hand. The drawing is the entire algorithm.
  3. Submit. The bank's dynamic private tests include the touching case, the empty case, a single interval, all-identical intervals, and a 10^5-interval stress test. Touching is the most common spec-misread; the public test catches it instantly.
  4. AI training: Optimise if your solution mutates the input list inefficiently. The assistant should produce the standard sort-then-sweep variant. Each request costs a slice of integrity.
  5. Move to meeting-rooms in exam mode. Same sort, simpler sweep. Solving it cold confirms you have generalised, not memorised.

Dispute is rare on interval problems because verdicts are unambiguous. The exception: if your 10^5 stress test seems to TLE on what you believe is an O(n log n) solution, dispute it. A human reviewer can re-examine the time limit. The platform expects scrutiny on stress harnesses.

In the AI-integrated workspace

Interval code is the place where AI agents most often misread the sort key or the comparison operator. The most common failure modes:

  1. Wrong sort key. The agent sorts by end time when start was the correct key, or sorts by interval length entirely. The output sometimes looks plausible on the public test and breaks on private ones.
  2. Wrong overlap operator. The agent uses < when <= was right (or vice versa). Touching-vs-overlapping is the spec ambiguity that generated code rarely raises.
  3. Forgetting the final commit. The last open interval is missing from the output because no gap triggered its emit. Caught when the input ends mid-merge.

The 2027 engineer's discipline for interval code:

  1. State the sort key and the merge condition in English before the agent codes. "Sort ascending by start. Merge when next.start <= current.end. Touching merges, since the spec says <=."
  2. Tell the agent to implement the sweep iteratively with an explicit out.append((cs, ce)) after the loop.
  3. Verify on the four-interval example by hand. Always. Cross-check the merged set against the input drawn on a number line.
  4. Stress on edge cases. Empty input. Single interval. All identical intervals. The touching case. Reverse-sorted input.

Whenever the spec mentions "calendar", "schedule", "billing window", "session lifetime", or "room conflict", the right move is merge-intervals or one of its tiny variants. Recognise the pattern in five seconds and you direct the AI agent productively. Miss the pattern and you review forty lines of generated nested loops that pass the sample input and fail in production. parikshan's loop trains the first kind of engineer.