longest-common-prefix
Difficulty: Easy · Topic: Math, Matrix & Heaps · Practice it on parikshan: /practice/longest-common-prefix/start
The story
A search team at Algolia builds autocomplete for an e-commerce catalogue. As the user types airpod, the system suggests airpods, airpods pro, airpods max. To rank suggestions correctly, the system needs to know the longest shared prefix of a candidate set. If three suggestions all start with airpods, the UI groups them under that shared root and displays the divergence inline. Same logic powers IDE autocomplete in IntelliJ and VS Code, command-line tab completion in zsh, and DNS suffix matching in routers.
That is longest-common-prefix. It is also the cleanest worked example of a fold across an array: hold a candidate, walk the rest, shrink the candidate when reality disagrees.
What's actually being asked
Given an array of strings arr, find the longest string that is a prefix of every element of arr. If no common prefix exists, return the empty string. The empty string is by convention a prefix of every string, so the function is always well-defined.
The naive approach
Compare every pair of strings character by character and find their longest common prefix, then intersect those pairwise prefixes. That is O(n^2 * L) where L is the average string length. Correct, slow, and unnecessary: a single linear sweep does the same job.
The key insight
The longest common prefix is bounded above by the shortest string in the array. Better: it is the result of repeatedly intersecting a running candidate against each subsequent string. Start with arr[0] as the candidate. For each remaining string, shrink the candidate to the longest prefix it shares with that string. Stop as soon as the candidate becomes empty, because nothing further can rescue it.
This is the horizontal scan: a fold-left across the array, with the candidate string as the accumulator.
There is also a vertical scan: walk a column index j from zero upward. At each j, check that every string has at least j+1 characters and that all arr[i][j] equal arr[0][j]. The first mismatch fixes the answer length at j. Same asymptotic cost, different code shape.
Both are O(S) where S is the total number of characters across all inputs. Sort-and-compare-extremes (sort the array, take the LCP of the first and last) is O(S log n), slower than the linear scans and only useful when you already have the array sorted for another reason.
Step-by-step approach
- If
arris empty, return the empty string. - Initialise
prefix = arr[0]. - For each
sinarr[1:]:- Walk index
iwhilei < len(prefix)andi < len(s)andprefix[i] == s[i]. - Set
prefix = prefix[:i](truncate to the matched length). - If
prefixis empty, break early; nothing can extend it.
- Walk index
- Return
prefix.
Worked example
arr = ["flower", "flow", "flight"]
- prefix =
"flower". - Compare with
"flow": walk while equal:f=f,l=l,o=o,w=w. At i=4,"flower"[4] = 'e'but"flow"[4]is out of range. Truncate toprefix[:4] = "flow". - Compare with
"flight": walk:f=f,l=l. At i=2,"flow"[2] = 'o',"flight"[2] = 'i'. Truncate toprefix[:2] = "fl". - Return
"fl".
Try the no-match case:
arr = ["dog", "racecar", "car"]
- prefix =
"dog". - Compare with
"racecar": at i=0,"d"!="r". Truncate to empty. Break. - Return
"".
Complexity
- Time: O(S) where
Sis the sum of|arr[i]|. The total work over all comparisons cannot exceed reading every character of every input once. - Space: O(1) beyond the slice for the candidate, which is bounded by the shortest input.
Trade-offs between vertical and horizontal scan
The two algorithms have the same O(S) complexity but different worst-case shapes.
Horizontal scan wins when one of the early strings has no shared prefix with arr[0]: it short-circuits the moment the candidate goes empty. It loses when all strings share most of their characters and the candidate stays long for many rounds.
Vertical scan wins when the common prefix is short: it stops at the first divergent column without inspecting the tail of any string. It loses when the answer is long, because it traverses every input up to that depth.
In practice both are linear and either is fine. The horizontal scan is easier to write defensively because there are no per-string bounds checks inside the inner loop; the slice handles it for you.
Common pitfalls
- Forgetting an empty string in the array: the answer must be empty. Both algorithms handle this correctly without a special case; the horizontal scan truncates to zero on the first character comparison, and the vertical scan fails the length check at j=0.
- Returning the shortest input by length: it is the upper bound, not the answer.
["abc", "abd"]has shortest length 3 but LCP"ab". - Not breaking early: missing the early exit when the candidate becomes empty is correct but slow when the array has one outlier near the start.
- Sorting first: a tempting trick but loses time. Use it only when the array is already sorted.
Where this shows up in the enterprise
- Algolia, Elasticsearch, Lucene: term autocompletion and prefix-based ranking. Tries are built on shared prefixes, and the LCP of a candidate set is what the trie's branching node sits on.
- Git internals: object database lookups by shortened SHA hashes.
git show 4b14only works if no other object shares that prefix; the storage engine finds the LCP of matches and asks for more characters. - IntelliJ and VS Code autocompletion: when multiple identifiers complete the typed prefix, the editor inserts up to the LCP and lets you choose the rest.
- DNS resolvers and routers: longest-prefix match on the IPv4 routing table is the same shape, generalised to bit prefixes. BGP routers use it for every packet.
- AWS S3 and Google Cloud Storage: when you list objects with a prefix, the system internally computes shared prefixes for the
CommonPrefixesfield that powers folder-like UI. - Bloomberg terminal command autocomplete: traders type partial commands and the terminal completes up to the shared prefix of valid commands.
The shape is universal: whenever a UI groups similar items by their head or a system indexes by prefix, LCP is the underlying operation.
Variants you'll see elsewhere
longest-common-subsequence: same intuition (a fold across two strings) but allows gaps. Dynamic programming,O(nm).longest-common-substring: contiguous, but not necessarily anchored at the start. DP again.find-the-shortest-uniquely-identifying-prefix: given a set of strings, for each one find the shortest prefix not shared with any other. Solved with a trie.replace-words: given a dictionary of roots, replace any word in a sentence that begins with a root. Trie-based prefix matching.
LCP is the simplest member of the family. Once you have it, the trie-based versions are mechanical to write.
How parikshan turns this into a learning loop
After you read this article and click through to the practice link, the parikshan flow integrates four layers:
- The editor evaluates against public examples plus dynamic private tests generated each session. The private suite covers single-element arrays, an array containing an empty string, and a maximally long shared prefix near the 200-character boundary.
- Hints are graduated. The first asks why an empty string in the array forces the answer empty. The third hands you the candidate-shrink pattern. The fourth is the implementation skeleton.
- The AI assistant can Explain, Review, Find Bugs, Add Comments, or Optimise at small integrity costs in exam mode, free in practice. Ask it to compare your horizontal-scan solution against a vertical-scan version and explain when each is faster.
- The dispute flow lets you escalate a contested verdict to a human reviewer.
Solve once in practice with the assistant as a sparring partner. Then take it in exam mode and aim for two minutes.
In the AI-integrated workspace
When you build autocompletion or prefix indexing at work, the agent will produce most of the code. Your value is in:
- Recognising the LCP shape when a colleague describes "find the common starting characters of these candidates" or "group records by their shared prefix."
- Picking horizontal versus vertical scan for the inputs you expect, based on whether you anticipate a long or short common prefix.
- Verifying the agent handles the empty-string-in-array case. This is the most commonly missed edge case in AI-generated code for this problem.
- Stress-testing with arrays of identical strings (LCP is the whole string) and arrays where one string is empty (LCP is empty).
Engineers who can spot the LCP shape in a feature spec ship faster than engineers who write nested loops every time the question recurs.