parikshan

longest-palindromic-substring

Difficulty: Medium  ·  Topic: 1-D Dynamic Programming  ·  Practice it on parikshan: /practice/longest-palindromic-substring/start

The story

A bioinformatics pipeline at Illumina scans short reads for palindromic motifs, which are biological markers for inverted-repeat regions where DNA folds back on itself. Each read is a string of base characters, and the analyst needs the longest contiguous palindromic substring in each read. The same query runs across millions of reads per sample.

That query is longest-palindromic-substring. Beyond genomics it appears in spell-correction palindrome filters at Google and Microsoft Word, in domain-name palindrome detection at Cloudflare, in compressed-string detection in HuggingFace tokenisers, and in the canonical 'expand around centre' family of string algorithms taught alongside Manacher's algorithm.

The interview punchline: there are two clean solutions. Expand-around-centre is simpler, ties to two pointers, and runs in O(n²) time with O(1) space. A full 2-D DP also exists; both are interview-frequent. This article walks both, then notes Manacher's O(n) algorithm for completeness.

What's actually being asked

Given a string s, return the longest contiguous substring of s that is a palindrome (reads the same forwards and backwards). If multiple substrings tie for the maximum length, return the lexicographically smallest one.

A contiguous substring is a slice s[i..j]; "subsequence" is a different problem.

Examples:

  • s = "babad""aba". Both "aba" and "bab" are length-3 palindromes; "aba" is lex-smaller.
  • s = "cbbd""bb".
  • s = "a""a". Single character.
  • s = """". Empty.
  • s = "forgeeksskeegfor""geeksskeeg".

The naive approach

Enumerate every (i, j), slice s[i:j+1], check equality with its reverse, track the longest. O(n³). At |s| = 1000 that is 10⁹ operations: TLE.

def naive(s):
    best = ""
    n = len(s)
    for i in range(n):
        for j in range(i, n):
            t = s[i:j+1]
            if t == t[::-1] and (len(t) > len(best) or (len(t) == len(best) and t < best)):
                best = t
    return best

Correct, slow, never ship it.

The key insight

Every palindromic substring has a centre. For odd-length palindromes, the centre is a single character. For even-length palindromes, the centre is the gap between two adjacent characters. So there are exactly 2n - 1 possible centres in a string of length n.

For each centre, expand outward with two pointers l and r while s[l] == s[r] and the indices remain in bounds. The total work is O(n²): each centre might in the worst case expand to half the string.

# Expand-around-centre recurrence (operational definition):
# For each centre c in 0..n-1 (odd length):
#   l = r = c
#   while l >= 0 and r < n and s[l] == s[r]:
#     record (l, r) as palindrome
#     l -= 1; r += 1
#
# For each gap-centre c in 0..n-2 (even length):
#   l = c; r = c + 1
#   while l >= 0 and r < n and s[l] == s[r]:
#     record (l, r) as palindrome
#     l -= 1; r += 1
#
# Track the longest. Tie-break lexicographically at the end.

This is technically not "1-D DP" in the array sense, but it is the same DP idea collapsed: each centre's expansion reuses the result of the immediately smaller palindrome at the same centre (s[l+1..r-1] is a palindrome iff s[l..r] was, given s[l] == s[r]). The recurrence is implicit in the loop.

The key insight: explicit 2-D DP

State: dp[i][j] is True iff s[i..j] is a palindrome. The recurrence is:

# 2-D DP recurrence:
# dp[i][i] = True                            (length 1)
# dp[i][i+1] = (s[i] == s[i+1])              (length 2)
# dp[i][j] = (s[i] == s[j]) AND dp[i+1][j-1] (length >= 3)
#
# Track the longest (i, j) with dp[i][j] = True.

The four DP questions:

  1. State: dp[i][j] = whether s[i..j] is a palindrome.
  2. Base case: dp[i][i] = True and dp[i][i+1] = (s[i] == s[i+1]).
  3. Transition: dp[i][j] = (s[i] == s[j]) and dp[i+1][j-1] for j - i >= 2.
  4. Answer: among all (i, j) with dp[i][j] = True, the longest; lex-tie-break by substring.

This is O(n²) time and O(n²) space. Worse memory than expand-around-centre, same time complexity, but the recurrence is explicit and useful to teach.

Step-by-step approach

def longest_palindrome(s):
    n = len(s)
    if n == 0:
        return ""
    max_len = 1
    starts = [0]

    for i in range(n):
        # Odd-length palindromes centred at i
        l, r = i, i
        while l >= 0 and r < n and s[l] == s[r]:
            l -= 1
            r += 1
        length = r - l - 1
        if length > max_len:
            max_len = length
            starts = [l + 1]
        elif length == max_len:
            starts.append(l + 1)

        # Even-length palindromes centred between i and i+1
        l, r = i, i + 1
        while l >= 0 and r < n and s[l] == s[r]:
            l -= 1
            r += 1
        length = r - l - 1
        if length >= 1:
            if length > max_len:
                max_len = length
                starts = [l + 1]
            elif length == max_len:
                starts.append(l + 1)

    return min(s[st:st + max_len] for st in starts)

O(n²) time, O(1) extra space (the starts list holds at most a handful of positions in practice). At |s| = 1000 that is 10⁶ operations: sub-millisecond.

Worked example

s = "babad"

Centres:

  • i=0 ('b'), odd: l=0 r=0 match → l=-1 r=1. Palindrome s[0:1] = "b", length 1.
  • i=0, even: l=0 r=1, s[0]='b', s[1]='a', no match. Length 0.
  • i=1 ('a'), odd: l=1 r=1 match → l=0 r=2, s[0]='b', s[2]='b' match → l=-1 r=3. Palindrome s[0:3] = "bab", length 3.
  • i=1, even: l=1 r=2, s[1]='a', s[2]='b', no match. Length 0.
  • i=2 ('b'), odd: l=2 r=2 match → l=1 r=3, s[1]='a', s[3]='a' match → l=0 r=4, s[0]='b', s[4]='d', no match. Palindrome s[1:4] = "aba", length 3.
  • i=2, even: l=2 r=3, s[2]='b', s[3]='a', no match. Length 0.
  • i=3 ('a'), odd: length 1.
  • i=3, even: no match. Length 0.
  • i=4 ('d'), odd: length 1.

Length-3 palindromes start at 0 ("bab") and 1 ("aba"). Lex-smaller is "aba". Return "aba". Verified.

Complexity

  • Naive O(n³): TLE at |s| = 1000.
  • Expand-around-centre: O(n²) time, O(1) extra space. Ship this.
  • 2-D DP: O(n²) time, O(n²) space. Useful for teaching, wasteful for production.
  • Manacher's algorithm: O(n) time using palindrome-mirror reuse; the gold-standard for |s| >= 10⁵. Out of scope for this article but worth knowing.

Common pitfalls

  • Slicing inside the expand loop: s[l:r+1] == s[l:r+1][::-1] inside the inner loop is O(n) per check, turning the algorithm into O(n³) and re-introducing the TLE. Compare characters by index; slice only once at the end per candidate.
  • Forgetting the even-length case: only handling odd-centred palindromes misses "abba" entirely. Both odd and even centres are required.
  • Off-by-one on r - l - 1: after the while loop, l and r have moved one step past the palindrome bounds. The palindrome is s[l+1..r-1], of length r - l - 1. Drawing this once and never forgetting is the whole pitfall.
  • Wrong tie-break: returning any palindrome of max length without lexicographic comparison. The spec is explicit; lex-smallest wins.
  • Empty string handling: s = "" should return "". If your code unconditionally indexes s[0], it crashes. Guard the empty case.

Where this shows up in the enterprise

  • Illumina, Oxford Nanopore, 23andMe: palindromic-motif detection in DNA reads for inverted-repeat regions, restriction-site mapping, and hairpin-loop structure prediction.
  • Spell-check (Google search corrections, Microsoft Word suggestions): palindrome detection feeds heuristic features that influence suggestion ranking.
  • Cloudflare domain analytics: detect palindromic vanity domains for traffic-pattern grouping.
  • Stripe fraud detection: palindromic patterns in transaction IDs (and other identifiers) are weak signals in fraud-model features.
  • HuggingFace tokeniser audits: palindromic substrings in vocabulary entries indicate symmetric merge artefacts.
  • Compression and pattern-mining: palindromic substring lengths feed entropy estimators in some compression pipelines.

The pattern is universal whenever structure-preserving symmetry matters: any time the algorithm asks "where is the largest mirror-symmetric block", this is the tool.

Variants you'll see elsewhere

  • palindromic-substrings: count all palindromic substrings (same expansion, just increment a counter at each successful expansion).
  • longest-palindromic-subsequence: subsequence, not substring. 2-D DP with dp[i][j] = LPS of s[i..j], recurrence dp[i][j] = dp[i+1][j-1] + 2 if s[i] == s[j] else max(dp[i+1][j], dp[i][j-1]).
  • palindrome-partitioning: partition s into the fewest substrings each of which is a palindrome. Combines this palindrome check with a 1-D DP on partitions.
  • palindrome-pairs: given a list of words, find pairs whose concatenation is a palindrome.
  • manachers-algorithm: O(n) palindrome radii using mirror reuse.
  • valid-palindrome / palindrome-number: simpler check-only variants.

How parikshan turns this into a learning loop

This problem is the bridge between 1-D DP and 2-D DP. It uses 1-D-style centre expansion but conceptually relies on the 2-D recurrence.

  1. Read this article and state both recurrences before writing code. The 2-D DP dp[i][j] = (s[i] == s[j]) AND dp[i+1][j-1] is the conceptual core; the expand-around-centre is the operational shortcut.
  2. Practice mode with no AI: write expand-around-centre. Verify on "babad" → "aba". The lex tie-break is the easiest place to get bug-tripped; the practice-mode dynamic tests include "aacabdkacaa" and similar tie-rich inputs.
  3. AI training: Explain. Ask the assistant "why does expand-around-centre work?" The good answer mentions the implicit recurrence: a palindrome at radius r plus matching outer characters becomes a palindrome at radius r+1. If the assistant hand-waves, distrust its claims about other DP problems.
  4. AI training: Find Bugs. Submit code that slices inside the inner loop. The assistant should flag the O(n³) regression. This is the most common Python-specific bug in this problem family.
  5. AI training: Optimise. Once correct, ask for Manacher's. The assistant likely produces it; verify the answer matches your O(n²) on "abacaba" and "forgeeksskeegfor".
  6. Exam mode: redo cold. Bank adversarial tests include "" (empty), single-character, all-same "aaaaa", and length-1000 inputs that force the algorithm to do real work.

In the AI-integrated workspace

This is one of the most common interview problems where AI agents produce code that almost works.

  • The slicing bug: an agent writes if s[l:r+1] == s[l:r+1][::-1] inside the expand loop. Code is correct but O(n³); the sample test passes; the stress test on |s| = 1000 TLEs.
  • The missing even-centre bug: an agent writes only the odd-centre expansion. Code passes "babad" and "cbbd"; fails on "abba" (returns "a" instead of "abba").
  • The off-by-one in length: writes length = r - l + 1 after the while loop instead of r - l - 1. Returns lengths that are wrong by 2. The bug shows up as palindromes that are too long.
  • The lex tie-break ignored: the agent returns any palindrome of max length. On "babad" it might return "bab" instead of "aba". Sample tests sometimes accept both, masking the bug.

The 2027 engineer's discipline:

  1. Name the algorithm in five seconds: "expand around centre, O(n²), both odd and even centres, track (start, length) not slices". That single phrase steers the agent away from every failure mode above.
  2. State the off-by-one once: after the while loop, the palindrome bounds are s[l+1..r-1] of length r - l - 1. Repeat this aloud before code generation.
  3. Verify on "abba". The answer must be "abba" (even-centre catcher).
  4. Verify on "babad". The answer must be "aba" (lex tie-break catcher).
  5. Verify on "". The answer must be "" (empty-input catcher).
  6. Ask explicitly for character-by-index comparison, not slicing. This single instruction prevents the O(n³) regression that nearly every default agent ships.

A candidate who can articulate "expand around centre, O(n²), watch the even centres and the lex tie-break" within seconds is the engineer who directs AI to a correct, fast, two-dozen-line solution. Without that articulation, the agent ships code that passes the sample inputs and quietly breaks in production on "abba", "", or |s| = 1000.

longest-palindromic-substring