parikshan

longest-increasing-subsequence

Difficulty: Medium  ·  Topic: 1-D Dynamic Programming  ·  Practice it on parikshan: /practice/longest-increasing-subsequence/start

The story

A 23andMe ancestry pipeline aligns a query genome against a reference haplotype. After per-marker scoring, the alignment layer needs the longest strictly increasing chain of marker positions (a chain of positions whose scoring offsets are monotonically rising) to confirm a haplotype block. Longer chain, stronger inferred ancestry segment. The pipeline runs this query on hundreds of thousands of markers per sample.

That chain length is longest-increasing-subsequence (LIS). The same shape appears in airline patience-sort baggage routing, in box-stacking and envelope nesting, in Russian doll problems on retail SKU sizes, in version-graph compatibility chains at Stripe, in palaeoclimate reconstruction (Oxford Nanopore-style sequence pipelines), in compounded promotion eligibility at Uber's tier-up logic, and in the canonical patience-sort algorithm taught alongside Dilworth's theorem.

LIS is also the textbook example where a clean O(n²) DP gives the right answer, and a slicker O(n log n) patience-sort beats it. Both solutions are interview-frequent; this article walks both.

What's actually being asked

Given an integer array arr of length n, return the length of its longest strictly increasing subsequence. A subsequence is obtained by deleting zero or more elements without reordering. "Strictly" means each successive element must be greater than (not equal to) the previous one.

Examples:

  • arr = [10, 9, 2, 5, 3, 7, 101, 18] → 4. One LIS: [2, 3, 7, 101] (or [2, 3, 7, 18]).
  • arr = [0, 1, 0, 3, 2, 3] → 4. LIS: [0, 1, 2, 3].
  • arr = [7, 7, 7, 7, 7, 7, 7] → 1. No two equal elements pair.
  • arr = [] → 0.

The naive approach

Enumerate every subsequence (2^n), check increasing, track longest. Exponential. For n = 30 you're already past 10⁹ operations. Hopeless beyond n = 20.

The key insight

Define dp[i] as the length of the LIS that ends exactly at index i. Then dp[i] = 1 + max(dp[j] for j < i if arr[j] < arr[i]), with base dp[i] = 1 for the trivial single-element subsequence. The answer is max(dp).

# Quadratic DP recurrence:
# dp[i] = length of LIS ending at index i
# dp[i] = 1 + max{ dp[j] : 0 <= j < i, arr[j] < arr[i] }     (max over empty set = 0)
# Answer = max over i of dp[i]

The four DP questions:

  1. State: dp[i] = length of the longest strictly increasing subsequence that ends at index i.
  2. Base case: dp[i] = 1 (the subsequence consisting of just arr[i]).
  3. Transition: dp[i] = 1 + max(dp[j]) over all j < i with arr[j] < arr[i]. If no such j exists, dp[i] remains 1.
  4. Answer: max(dp).
def lis_n2(arr):
    n = len(arr)
    if n == 0: return 0
    dp = [1] * n
    for i in range(1, n):
        best = 0
        for j in range(i):
            if arr[j] < arr[i] and dp[j] > best:
                best = dp[j]
        dp[i] = best + 1
    return max(dp)

O(n²) time. Fine up to n ~ 5000. At n = 10⁵ you need a faster solution.

The key insight: O(n log n) patience sort

Keep a list tails such that, after processing the prefix arr[0..i], tails[k] is the smallest possible last element among all increasing subsequences of length k + 1 seen so far. By construction, tails is strictly increasing. Its length at the end equals the LIS length.

Why the smallest tail per length? Because a smaller tail is strictly easier to extend: any future element that could extend a length-(k+1) subsequence ending at t could also extend one ending at t' < t. Keeping the smallest tail never costs anything and may unlock longer subsequences later.

The update for a new element x:

  • Binary-search tails for the first index i with tails[i] >= x (use bisect_left).
  • If i == len(tails), append x. You have found a longer subsequence ending in x.
  • Otherwise, set tails[i] = x. You have found a smaller tail for length-(i+1) subsequences.
# Patience-sort recurrence (operational definition):
# tails = []
# for x in arr:
#   i = bisect_left(tails, x)
#   if i == len(tails): tails.append(x)
#   else: tails[i] = x
# Answer = len(tails)

Each binary search is O(log n); total O(n log n).

import bisect

def lis_nlogn(arr):
    tails = []
    for x in arr:
        i = bisect.bisect_left(tails, x)
        if i == len(tails):
            tails.append(x)
        else:
            tails[i] = x
    return len(tails)

Important caveat: tails is a length tracker, not an actual LIS. Its final contents may not correspond to any specific subsequence in the input. If you need to reconstruct an LIS, you must store predecessor pointers alongside the patience-sort run.

Strict vs non-decreasing: for strictly increasing use bisect_left. For non-decreasing (equal values count), use bisect_right. One-character bug, huge correctness implications.

Step-by-step approach

For interview ship, write the patience sort:

  1. Initialise tails = [] (will hold the smallest possible tail per LIS length).
  2. For each x in arr:
    • i = bisect_left(tails, x) (use bisect_right if the problem allows ties).
    • If i == len(tails): append x; you have extended to a new longest length.
    • Else: overwrite tails[i] = x; you have improved the smallest tail for that length.
  3. Return len(tails).

For the educational O(n²) DP:

  1. Allocate dp = [1] * n.
  2. For i from 1 to n - 1:
    • Set dp[i] = 1 + max(dp[j] for j < i if arr[j] < arr[i]), or 1 if no such j.
  3. Return max(dp) (or 0 if arr is empty).

Always ship the patience sort at n = 10⁵. Keep the DP for verification on small inputs.

Worked example

arr = [10, 9, 2, 5, 3, 7, 101, 18]
xbisect_left(tails, x)actiontails after
100append[10]
90replace[9]
20replace[2]
51append[2, 5]
31replace[2, 3]
72append[2, 3, 7]
1013append[2, 3, 7, 101]
183replace[2, 3, 7, 18]

len(tails) = 4. Verified.

Note that the final tails = [2, 3, 7, 18] happens to be a real LIS, but in general tails is a hybrid record (e.g. for [3, 1, 2, 4], after processing all elements tails = [1, 2, 4], which is a real LIS; but for [5, 1, 6, 2, 7], tails = [1, 2, 7] which is also a real LIS; many such inputs accidentally line up). Do not rely on it for reconstruction.

Complexity

  • O(n²) DP: time O(n²), space O(n). Acceptable to n ~ 5000.
  • Patience sort: time O(n log n), space O(n). Required at n = 10⁵.

For interview purposes know both. Most senior interviews ask for the O(n log n) and want you to articulate why the smallest-tail invariant works.

Common pitfalls

  • bisect_right instead of bisect_left for strict increasing: equal values get appended instead of replacing, inflating the count. On [7, 7, 7] you get 3 instead of 1.
  • Treating tails as an LIS: for many inputs the final tails happens to be a valid LIS; for many others it is not. Do not return it as the answer to "give me an LIS"; if you need the sequence, store predecessors during the run.
  • Off-by-one in the DP base case: dp[i] = 1, not 0. The single-element subsequence has length 1, not 0.
  • Sorting first: tempting but destroys order. LIS is fundamentally about the original order of the array.
  • Non-strict requirements: be sure of "strictly" vs "non-decreasing" before writing code. bisect_left vs bisect_right is the entire difference.

Where this shows up in the enterprise

  • Genomic chain alignment (Illumina secondary analysis, Oxford Nanopore basecalling, 23andMe ancestry): longest monotone chain of marker offsets is an LIS computation.
  • Box / envelope nesting at Amazon Supply Chain: longest chain of strictly nested box dimensions (sort by one dimension, LIS on the other).
  • Russian doll problems at retail SKU planning: similar nesting structure.
  • Bloomberg ticker analytics: longest monotone rally in a price stream (strict-increasing subsequence of intraday lows).
  • Stripe version-compatibility graph: longest chain of API versions that can be upgraded one-by-one without breaking.
  • Patience sort itself: the algorithm is named after the card game; in card-sorting hardware and airline baggage routing the operational sort is equivalent.
  • Uber driver tier compaction: longest chain of monotone weekly metrics required to qualify for a promotion tier.
  • NLP / IR alignment: longest monotone alignment between source and target token positions in machine translation evaluation.
  • Speech recognition (Apple Siri, Google Assistant) acoustic alignment: longest strictly increasing chain of frame indices in a confidence-ordered set.

LIS is one of the half-dozen DP problems that show up across multiple unrelated industries.

Variants you'll see elsewhere

  • longest-non-decreasing-subsequence: identical algorithm with bisect_right.
  • longest-decreasing-subsequence: negate the array and run LIS.
  • number-of-longest-increasing-subsequences: count, not length; O(n²) DP with a parallel count array, or O(n log n) with Fenwick tree.
  • russian-doll-envelopes: 2-D LIS. Sort by width ascending (and height descending to break ties), then LIS on heights.
  • longest-common-subsequence: 2-D DP, different problem family.
  • box-stacking: 3-D LIS variant.
  • patience-diff: source-code diff algorithm built atop patience sort.

The LIS skeleton generalises to any "longest monotone chain under a comparison" problem.

How parikshan turns this into a learning loop

LIS is the highest-leverage DP problem after Kadane. Drill both solutions.

  1. Read this article and state both recurrences before writing code. The O(n²) DP is "ends at i, scan back"; the patience sort is "smallest tail per length".
  2. Practice mode with no AI: code the O(n²) DP first. Verify on [10, 9, 2, 5, 3, 7, 101, 18] → 4. Then code the patience sort; verify on the same input.
  3. AI training: Explain. Ask the assistant why the smallest-tail invariant works. The good answer: "smaller tail is easier to extend, keeping the smallest never costs anything". Anything vaguer is a sign the assistant doesn't really know the algorithm; treat its claims about other DP problems with extra scepticism.
  4. AI training: Find Bugs. If you used bisect_right by accident on strict input, paste the code with input [7, 7, 7]. The assistant should point at the wrong bisect choice.
  5. AI training: Optimise. Submit the O(n²) and ask for O(n log n). The good response produces patience sort; a mediocre response just adds early termination.
  6. Exam mode: redo cold. The bank's adversarial test includes [] (empty, expect 0), [7, 7, 7, 7] (strict, expect 1), and a 10⁵-length array (must run the patience sort to clear).

In the AI-integrated workspace

LIS is one of the highest-frequency interview problems, and AI failure modes here are very stereotyped.

  • The bisect_right bug on strict-increasing problems: an agent grabs the wrong bisect helper and silently overcounts on equal values. Sample tests on distinct inputs pass; the [7, 7, 7, 7] adversarial test fails.
  • The "return tails" bug: an agent returns tails as the LIS itself, not just len(tails). The output looks plausible (sorted, increasing) but is not always a real subsequence of the input.
  • The O(n²) ship at n = 10⁵: an agent produces correct O(n²) DP, sample tests pass, the stress test TLEs.
  • The "sort and dedupe" trap: the agent sorts the input first to "make it easier"; this destroys order and the answer becomes len(set(arr)), completely wrong.

The 2027 engineer's discipline:

  1. Name the algorithm in three seconds: "patience sort, O(n log n), bisect_left for strict". That single phrase steers the agent away from every failure mode above.
  2. State the smallest-tail invariant in English: "tails[k] is the smallest possible last element of any length-(k+1) increasing subsequence seen so far". Pin the invariant before generating code.
  3. Verify on [10, 9, 2, 5, 3, 7, 101, 18]. The answer must be 4. Off by one on the strict-vs-non-decreasing distinction surfaces here.
  4. Verify on [7, 7, 7]. The answer must be 1. The bisect_right bug surfaces here.
  5. Push for both implementations in the agent's reply. The O(n²) DP is the verification harness; the patience sort is the production ship. Many agents only produce one; the discipline is to ask for both.

LIS, more than any other 1-D DP problem, rewards an engineer who knows the algorithm names and invariants. The parikshan loop is built so that by the time you finish this article, you can write both versions cold and explain the smallest-tail invariant in one sentence.

longest-increasing-subsequence