longest-increasing-subsequence
Difficulty: Medium · Topic: 1-D Dynamic Programming · Practice it on parikshan: /practice/longest-increasing-subsequence/start
The story
A 23andMe ancestry pipeline aligns a query genome against a reference haplotype. After per-marker scoring, the alignment layer needs the longest strictly increasing chain of marker positions (a chain of positions whose scoring offsets are monotonically rising) to confirm a haplotype block. Longer chain, stronger inferred ancestry segment. The pipeline runs this query on hundreds of thousands of markers per sample.
That chain length is longest-increasing-subsequence (LIS). The same shape appears in airline patience-sort baggage routing, in box-stacking and envelope nesting, in Russian doll problems on retail SKU sizes, in version-graph compatibility chains at Stripe, in palaeoclimate reconstruction (Oxford Nanopore-style sequence pipelines), in compounded promotion eligibility at Uber's tier-up logic, and in the canonical patience-sort algorithm taught alongside Dilworth's theorem.
LIS is also the textbook example where a clean O(n²) DP gives the right answer, and a slicker O(n log n) patience-sort beats it. Both solutions are interview-frequent; this article walks both.
What's actually being asked
Given an integer array arr of length n, return the length of its longest strictly increasing subsequence. A subsequence is obtained by deleting zero or more elements without reordering. "Strictly" means each successive element must be greater than (not equal to) the previous one.
Examples:
arr = [10, 9, 2, 5, 3, 7, 101, 18]→ 4. One LIS:[2, 3, 7, 101](or[2, 3, 7, 18]).arr = [0, 1, 0, 3, 2, 3]→ 4. LIS:[0, 1, 2, 3].arr = [7, 7, 7, 7, 7, 7, 7]→ 1. No two equal elements pair.arr = []→ 0.
The naive approach
Enumerate every subsequence (2^n), check increasing, track longest. Exponential. For n = 30 you're already past 10⁹ operations. Hopeless beyond n = 20.
The key insight
Define dp[i] as the length of the LIS that ends exactly at index i. Then dp[i] = 1 + max(dp[j] for j < i if arr[j] < arr[i]), with base dp[i] = 1 for the trivial single-element subsequence. The answer is max(dp).
# Quadratic DP recurrence:
# dp[i] = length of LIS ending at index i
# dp[i] = 1 + max{ dp[j] : 0 <= j < i, arr[j] < arr[i] } (max over empty set = 0)
# Answer = max over i of dp[i]
The four DP questions:
- State:
dp[i]= length of the longest strictly increasing subsequence that ends at indexi. - Base case:
dp[i] = 1(the subsequence consisting of justarr[i]). - Transition:
dp[i] = 1 + max(dp[j])over allj < iwitharr[j] < arr[i]. If no suchjexists,dp[i]remains 1. - Answer:
max(dp).
def lis_n2(arr):
n = len(arr)
if n == 0: return 0
dp = [1] * n
for i in range(1, n):
best = 0
for j in range(i):
if arr[j] < arr[i] and dp[j] > best:
best = dp[j]
dp[i] = best + 1
return max(dp)
O(n²) time. Fine up to n ~ 5000. At n = 10⁵ you need a faster solution.
The key insight: O(n log n) patience sort
Keep a list tails such that, after processing the prefix arr[0..i], tails[k] is the smallest possible last element among all increasing subsequences of length k + 1 seen so far. By construction, tails is strictly increasing. Its length at the end equals the LIS length.
Why the smallest tail per length? Because a smaller tail is strictly easier to extend: any future element that could extend a length-(k+1) subsequence ending at t could also extend one ending at t' < t. Keeping the smallest tail never costs anything and may unlock longer subsequences later.
The update for a new element x:
- Binary-search
tailsfor the first indexiwithtails[i] >= x(usebisect_left). - If
i == len(tails), appendx. You have found a longer subsequence ending inx. - Otherwise, set
tails[i] = x. You have found a smaller tail for length-(i+1)subsequences.
# Patience-sort recurrence (operational definition):
# tails = []
# for x in arr:
# i = bisect_left(tails, x)
# if i == len(tails): tails.append(x)
# else: tails[i] = x
# Answer = len(tails)
Each binary search is O(log n); total O(n log n).
import bisect
def lis_nlogn(arr):
tails = []
for x in arr:
i = bisect.bisect_left(tails, x)
if i == len(tails):
tails.append(x)
else:
tails[i] = x
return len(tails)
Important caveat: tails is a length tracker, not an actual LIS. Its final contents may not correspond to any specific subsequence in the input. If you need to reconstruct an LIS, you must store predecessor pointers alongside the patience-sort run.
Strict vs non-decreasing: for strictly increasing use bisect_left. For non-decreasing (equal values count), use bisect_right. One-character bug, huge correctness implications.
Step-by-step approach
For interview ship, write the patience sort:
- Initialise
tails = [](will hold the smallest possible tail per LIS length). - For each
xinarr:i = bisect_left(tails, x)(usebisect_rightif the problem allows ties).- If
i == len(tails): appendx; you have extended to a new longest length. - Else: overwrite
tails[i] = x; you have improved the smallest tail for that length.
- Return
len(tails).
For the educational O(n²) DP:
- Allocate
dp = [1] * n. - For
ifrom1ton - 1:- Set
dp[i] = 1 + max(dp[j] for j < i if arr[j] < arr[i]), or1if no suchj.
- Set
- Return
max(dp)(or0ifarris empty).
Always ship the patience sort at n = 10⁵. Keep the DP for verification on small inputs.
Worked example
arr = [10, 9, 2, 5, 3, 7, 101, 18]
| x | bisect_left(tails, x) | action | tails after |
|---|---|---|---|
| 10 | 0 | append | [10] |
| 9 | 0 | replace | [9] |
| 2 | 0 | replace | [2] |
| 5 | 1 | append | [2, 5] |
| 3 | 1 | replace | [2, 3] |
| 7 | 2 | append | [2, 3, 7] |
| 101 | 3 | append | [2, 3, 7, 101] |
| 18 | 3 | replace | [2, 3, 7, 18] |
len(tails) = 4. Verified.
Note that the final tails = [2, 3, 7, 18] happens to be a real LIS, but in general tails is a hybrid record (e.g. for [3, 1, 2, 4], after processing all elements tails = [1, 2, 4], which is a real LIS; but for [5, 1, 6, 2, 7], tails = [1, 2, 7] which is also a real LIS; many such inputs accidentally line up). Do not rely on it for reconstruction.
Complexity
- O(n²) DP: time
O(n²), spaceO(n). Acceptable ton ~ 5000. - Patience sort: time
O(n log n), spaceO(n). Required atn = 10⁵.
For interview purposes know both. Most senior interviews ask for the O(n log n) and want you to articulate why the smallest-tail invariant works.
Common pitfalls
bisect_rightinstead ofbisect_leftfor strict increasing: equal values get appended instead of replacing, inflating the count. On[7, 7, 7]you get 3 instead of 1.- Treating
tailsas an LIS: for many inputs the finaltailshappens to be a valid LIS; for many others it is not. Do not return it as the answer to "give me an LIS"; if you need the sequence, store predecessors during the run. - Off-by-one in the DP base case:
dp[i] = 1, not0. The single-element subsequence has length 1, not 0. - Sorting first: tempting but destroys order. LIS is fundamentally about the original order of the array.
- Non-strict requirements: be sure of "strictly" vs "non-decreasing" before writing code.
bisect_leftvsbisect_rightis the entire difference.
Where this shows up in the enterprise
- Genomic chain alignment (Illumina secondary analysis, Oxford Nanopore basecalling, 23andMe ancestry): longest monotone chain of marker offsets is an LIS computation.
- Box / envelope nesting at Amazon Supply Chain: longest chain of strictly nested box dimensions (sort by one dimension, LIS on the other).
- Russian doll problems at retail SKU planning: similar nesting structure.
- Bloomberg ticker analytics: longest monotone rally in a price stream (strict-increasing subsequence of intraday lows).
- Stripe version-compatibility graph: longest chain of API versions that can be upgraded one-by-one without breaking.
- Patience sort itself: the algorithm is named after the card game; in card-sorting hardware and airline baggage routing the operational sort is equivalent.
- Uber driver tier compaction: longest chain of monotone weekly metrics required to qualify for a promotion tier.
- NLP / IR alignment: longest monotone alignment between source and target token positions in machine translation evaluation.
- Speech recognition (Apple Siri, Google Assistant) acoustic alignment: longest strictly increasing chain of frame indices in a confidence-ordered set.
LIS is one of the half-dozen DP problems that show up across multiple unrelated industries.
Variants you'll see elsewhere
longest-non-decreasing-subsequence: identical algorithm withbisect_right.longest-decreasing-subsequence: negate the array and run LIS.number-of-longest-increasing-subsequences: count, not length; O(n²) DP with a parallel count array, or O(n log n) with Fenwick tree.russian-doll-envelopes: 2-D LIS. Sort by width ascending (and height descending to break ties), then LIS on heights.longest-common-subsequence: 2-D DP, different problem family.box-stacking: 3-D LIS variant.patience-diff: source-code diff algorithm built atop patience sort.
The LIS skeleton generalises to any "longest monotone chain under a comparison" problem.
How parikshan turns this into a learning loop
LIS is the highest-leverage DP problem after Kadane. Drill both solutions.
- Read this article and state both recurrences before writing code. The O(n²) DP is "ends at i, scan back"; the patience sort is "smallest tail per length".
- Practice mode with no AI: code the O(n²) DP first. Verify on
[10, 9, 2, 5, 3, 7, 101, 18] → 4. Then code the patience sort; verify on the same input. - AI training: Explain. Ask the assistant why the smallest-tail invariant works. The good answer: "smaller tail is easier to extend, keeping the smallest never costs anything". Anything vaguer is a sign the assistant doesn't really know the algorithm; treat its claims about other DP problems with extra scepticism.
- AI training: Find Bugs. If you used
bisect_rightby accident on strict input, paste the code with input[7, 7, 7]. The assistant should point at the wrong bisect choice. - AI training: Optimise. Submit the O(n²) and ask for O(n log n). The good response produces patience sort; a mediocre response just adds early termination.
- Exam mode: redo cold. The bank's adversarial test includes
[](empty, expect 0),[7, 7, 7, 7](strict, expect 1), and a 10⁵-length array (must run the patience sort to clear).
In the AI-integrated workspace
LIS is one of the highest-frequency interview problems, and AI failure modes here are very stereotyped.
- The
bisect_rightbug on strict-increasing problems: an agent grabs the wrong bisect helper and silently overcounts on equal values. Sample tests on distinct inputs pass; the[7, 7, 7, 7]adversarial test fails. - The "return tails" bug: an agent returns
tailsas the LIS itself, not justlen(tails). The output looks plausible (sorted, increasing) but is not always a real subsequence of the input. - The O(n²) ship at n = 10⁵: an agent produces correct O(n²) DP, sample tests pass, the stress test TLEs.
- The "sort and dedupe" trap: the agent sorts the input first to "make it easier"; this destroys order and the answer becomes
len(set(arr)), completely wrong.
The 2027 engineer's discipline:
- Name the algorithm in three seconds: "patience sort, O(n log n), bisect_left for strict". That single phrase steers the agent away from every failure mode above.
- State the smallest-tail invariant in English: "tails[k] is the smallest possible last element of any length-(k+1) increasing subsequence seen so far". Pin the invariant before generating code.
- Verify on
[10, 9, 2, 5, 3, 7, 101, 18]. The answer must be 4. Off by one on the strict-vs-non-decreasing distinction surfaces here. - Verify on
[7, 7, 7]. The answer must be 1. Thebisect_rightbug surfaces here. - Push for both implementations in the agent's reply. The O(n²) DP is the verification harness; the patience sort is the production ship. Many agents only produce one; the discipline is to ask for both.
LIS, more than any other 1-D DP problem, rewards an engineer who knows the algorithm names and invariants. The parikshan loop is built so that by the time you finish this article, you can write both versions cold and explain the smallest-tail invariant in one sentence.