parikshan

merge-two-sorted-lists

Difficulty: Easy  ·  Topic: Linked List  ·  Practice it on parikshan: /practice/merge-two-sorted-lists/start

The story

PostgreSQL's external sort spills to disk in sorted runs. When the final pass merges them, each run is a sorted stream and the planner consumes them in a k-way merge. The base case of a k-way merge is the two-way merge: walk two sorted streams in lockstep, always emitting the smaller head. With trillion-row analytical queries at Snowflake, Databricks, or BigQuery, that merge is the hottest inner loop in the entire query engine.

merge-two-sorted-lists is the merge half of merge sort, expressed on linked lists. The list framing matters: a real merge operates on streams without random access, exactly the way a Linked List operates. The pattern shows up everywhere data is ingested from multiple sorted sources.

What's actually being asked

You are given the heads of two singly linked lists, each sorted in non-decreasing order. Merge them into a single sorted list and return the new head. The input lists may be empty; their combined length is at most 100,000.

The naive approach

Concatenate the two lists into one, then sort. O((n1 + n2) log(n1 + n2)) time. Correct, slow, and throws away the precondition. The fact that the inputs are already sorted is the entire problem; ignoring it is a signal you have not read the spec.

A second naive: collect every value into an array, sort, build a fresh list. Same complexity, extra O(n) space. Same diagnosis: the precondition was free information.

The key insight

Two sorted streams can be merged in one pass by always taking the smaller of the two front values. The reason this works: every value in either list is already in sorted order relative to its own list, so the smaller of a[i] and b[j] is by definition the smallest value not yet placed in the output. Advance the index of whichever side you took.

When one stream runs out, the remaining stream is already sorted and can be appended wholesale.

The linked-list twist: do not allocate new nodes. Splice the existing nodes into the output by reassigning next pointers. This is O(1) extra memory and is what the spec is testing.

Step-by-step approach

The cleanest implementation uses a dummy head sentinel. The dummy node lets the loop append without a "is this the first node?" special case; at the end, return dummy.next.

class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next

def merge_two_lists(a, b):
    dummy = ListNode()                 # sentinel; never returned to caller
    tail = dummy
    while a is not None and b is not None:
        if a.val <= b.val:
            tail.next = a
            a = a.next
        else:
            tail.next = b
            b = b.next
        tail = tail.next
    tail.next = a if a is not None else b   # append the remaining tail
    return dummy.next

Loop invariant in English: dummy.next ... tail is a sorted list containing every node already consumed from a and b, and a/b point at the unconsumed remainders, which are themselves sorted. The final splice covers the case where one list ran dry.

Use <= (not <) so that equal values from the left list are preferred. This makes the merge stable, which matters in production where two records with equal keys must retain their original order (e.g., PostgreSQL's ORDER BY with a stable sort flag, or Stripe's transaction ordering when two charges share a timestamp).

Worked example

a = 1 -> 2 -> 4
b = 1 -> 3 -> 5
Stepa.valb.valtakeoutput so fara remainingb remaining
111a12 -> 41 -> 3 -> 5
221b1 -> 12 -> 43 -> 5
323a1 -> 1 -> 243 -> 5
443b1 -> 1 -> 2 -> 345
545a1 -> 1 -> 2 -> 3 -> 4null5

Loop exits (a is null). Splice the remaining b (single node 5). Final list: 1 -> 1 -> 2 -> 3 -> 4 -> 5. ✓

Counter-cases:

  • One list empty: the loop never runs; the splice tail.next = a or b attaches the non-empty list directly. O(1).
  • Both empty: the loop never runs; the splice attaches null. Return dummy.next = null. Correct.

Complexity

  • Time: O(n1 + n2). Every node is touched once.
  • Space: O(1) extra. No new nodes are allocated; the dummy is a single stack-resident sentinel.

The output list reuses the input nodes; if the caller still holds references to either original head, those lists are now interleaved with the output and effectively destroyed. If the caller needs to keep them, copy first; that bumps space to O(n1 + n2).

Common pitfalls

  1. Skipping the dummy head. The "is this the first node?" branch the dummy avoids is the Linked List primer's first lesson. Without it, the code has four special cases (a empty, b empty, first node from a, first node from b) and at least one always has a bug.
  2. Forgetting to splice the tail. After the loop, exactly one of a or b is non-null. Forgetting tail.next = a or b produces a list of size min(n1, n2) instead of n1 + n2. The sample input [1, 2, 4] and [1, 3, 5] exposes this bug; the dynamic private test on parikshan generates lists of very different lengths to hammer it.
  3. Using < instead of <=. Equal values from the right list are taken first, which inverts the stable-sort property. Most public test suites do not catch this; production reordering does.
  4. Re-allocating nodes. new_node = ListNode(min(a.val, b.val)) works but doubles memory and ignores the splice instruction. In an interview, "did you mutate in place?" is the unspoken follow-up.

Where this shows up in the enterprise

  • PostgreSQL / MySQL external sort: the multi-way merge of spilled sorted runs. Two-way merge is the base case.
  • Snowflake / Databricks / BigQuery query plans: merge nodes in sort-merge join, group-by, and window functions.
  • Apache Kafka log compaction: merging two sorted segments produces the compacted segment.
  • Stripe / Adyen ledger reconciliation: two sorted transaction streams from acquirer and processor are merged on (timestamp, id).
  • Bloomberg market data feeds: merging two sorted-by-time price streams from primary and backup exchanges.
  • Apple News / Google News aggregators: merging sorted-by-publish-time article feeds.
  • Git log --merge-order: a two-way merge of commit lists from two branches.

The fingerprint is always two already-sorted streams that must produce one sorted stream without re-sorting. If you ever see a senior engineer reach for sorted(a + b) on two sorted inputs, you know they have not internalised this pattern.

Variants you'll see elsewhere

  • merge-k-sorted-lists: generalises to k streams. Naive is k-way merge with a min-heap (O((n) log k)); brute is concatenate-then-sort. The two-way merge in this article is called repeatedly.
  • sort-list: merge-sort on a linked list. The merge step is this exact function; the split step uses middle-of-linked-list.
  • interval-list-intersections: two sorted lists of intervals, output their intersections. Same two-pointer skeleton.
  • meeting-rooms-ii: a single sorted list of starts and a sorted list of ends; merge to count overlaps.
  • merge-sorted-arrays: array variant. Same idea, indices instead of pointers.

The pattern is the two-pointer sweep over sorted inputs. It is one of the four big patterns in this catalog, alongside Sliding Window, Binary Search, and Hash Map.

How parikshan turns this into a learning loop

Recommended sequence on parikshan:

  1. Solve reverse-linked-list first; that locks the dummy-head and splice patterns.
  2. Open this in practice mode, AI off. Code the dummy-head template. Run public tests.
  3. The dynamic private tests will throw: empty + empty, empty + non-empty, equal first values, lists of very different lengths, lists with all-equal values. The empty cases and the equal-first-value case catch the most common bugs.
  4. Re-solve without the dummy head and feel the four extra branches you needed. Decide whether the dummy is worth the four-line cost (answer: yes, always).
  5. Once green, ask the parikshan AI to Review for stability. It should mention <= vs <. If it does not, ask "is this merge stable?" The answer trains both you and the model.

In the AI-integrated workspace

Generated merge-two-sorted-lists code is usually correct, with two failure modes: (1) missing dummy head (four-way special-case soup), (2) forgetting the post-loop tail splice. Both bugs show up on test inputs where one list is much shorter than the other; both are silent on the LeetCode sample because the sample is [1, 2, 4] and [1, 3, 5] (same length, neatly interleaved).

Verification ritual on every agent-generated merge:

  1. Is there a dummy head? If not, ask the agent to refactor with one. The refactor will simplify the code so visibly that the agent will tell you it should have done this the first time.
  2. After the loop, is the remaining tail spliced? Find the tail.next = a if ... else b line. If it is missing, the merge truncates.
  3. Is the comparison <= or <? If the spec demands stable order, <= is required.

Senior engineers reading agent code do this in under thirty seconds. Junior engineers (and the agents themselves) ship the bug. parikshan's Linked List cluster trains the senior reflex.

merge-two-sorted-lists