word-ladder
Difficulty: Hard · Topic: Graphs · Practice it on parikshan: /practice/word-ladder/start
The story
A language-learning startup builds a "step-by-step" feature that turns one word into another, one letter at a time, while always passing through real dictionary words. The marketing team thinks it is a cute warm-up exercise. The engineering team realises within a week that it is the single most-asked-about feature, because the user is delighted by a clean five-step transformation and bored by an eight-step one. The product owner wants the minimum ladder length, computed on a dictionary of up to 5,000 words, in well under a second.
This is word-ladder. The same algorithm runs spell-correction suggestion ranking, DNA mutation distance computation, anagram-distance pathfinding in word-game engines, route-finding over implicit state graphs in puzzle solvers, and graph-based recommendation refinement at Spotify and Netflix when the "graph" is implicit and built lazily. The keyword is implicit graph: the edges are not stored, they are generated on demand.
This is the only Hard problem in the graphs cluster. The reason it is hard is not the algorithm (it is BFS) but the engineering: a naive BFS times out, and the two optimisations that save it are conceptually the most interesting of the cluster.
What's actually being asked
You receive beginWord, endWord, and a dictionary of M same-length lowercase words. A "transformation sequence" is a chain s0, s1, ..., sk where s0 = beginWord, sk = endWord, every adjacent pair differs in exactly one letter, and every s1, ..., sk is in the dictionary. Return the number of words in the shortest sequence (k + 1), or 0 if none exists.
The naive approach
Build an explicit graph: for every pair of words, check whether they differ in exactly one letter. If so, add an edge. Then BFS from beginWord to endWord.
The adjacency build is O(M^2 * L): every pair of M words compared character by character. For M = 5000 and L = 10, that is 2.5 * 10^8 character comparisons just to build the graph. In Python, that is multi-second. The grader runs out of patience.
There is no way to speed up the BFS itself: it is already O(V + E). The bottleneck is the neighbour function.
The key insight
Two words are neighbours iff they differ in exactly one position. Equivalently, they share an L-1-character "pattern" with one position replaced by a wildcard *. Group words by patterns: for each word, generate its L wildcard patterns and put the word into each pattern's bucket. Two words are neighbours iff they sit in at least one shared bucket.
Building the bucket map is O(M * L^2): M words, L patterns per word, L-length string per pattern. For M = 5000, L = 10, that is 5 * 10^5 operations. Negligible.
The BFS now expands a word's neighbours by iterating its L patterns and walking the bucket for each pattern: O(L^2 + L * (bucket size)) per word. Across the whole BFS, total work is O(M * L^2) and the constant factor is small enough that Python comfortably finishes in 100ms on the largest input.
That single optimisation is enough to pass the grader on every test. But there is a second optimisation that drops the wall-clock by another factor of 2 to 10 and that is interview gold.
The key insight (2 of 2): bidirectional BFS
Single-end BFS expands a frontier from beginWord until it touches endWord. The branching factor b is roughly the average number of one-edit neighbours per word. The work is O(b^d) where d is the answer (the shortest ladder length). For b = 10 and d = 8, that is 10^8. Tight even with bucket lookups.
Bidirectional BFS runs two BFS frontiers, one from each end, expanding the smaller of the two at each step, stopping as soon as the two frontiers intersect. The work is O(b^(d/2) + b^(d/2)) = O(b^(d/2)). For b = 10 and d = 8, that is 2 * 10^4. Four orders of magnitude cheaper than single-end BFS.
The geometric intuition: a ball of radius d/2 from each end has total volume 2 * b^(d/2), which is much less than a single ball of radius d (volume b^d). Two small balls beat one big ball.
The implementation discipline: at each step, expand the smaller frontier. This is the half-life of the optimisation. Expanding the larger frontier doubles its size while the other waits; expanding the smaller keeps both frontiers roughly the same size and minimises wasted work.
Step-by-step approach
Single-end BFS with patterns
- Read
begin,end, dictionary listwords. - If
end not in set(words): return0. - If
begin == end: return1. - Build
pat -> [words]map: for each wordwinwords, for eachiin0..L-1, appendwtobucket[w[:i] + '*' + w[i+1:]]. - BFS: deque of
(word, steps), visited set starting withbegin. Pop, generate patterns, walk each pattern's bucket; ifendin bucket, returnsteps + 1; otherwise for each unvisited word in the bucket, mark and enqueue withsteps + 1. After expansion, clear the pattern's bucket (so it is not re-walked). - Return
0if the queue empties.
Bidirectional BFS
- Same pre-checks and bucket build.
- Maintain two sets:
front = {begin}andback = {end}. Maintainvisitedwith both. steps = 1.- While both
frontandbackare non-empty:- If
len(front) > len(back): swap. - Build
next_front. For eachwinfront, for each pattern ofw, for eachneighbourin the bucket:- If
neighbour in back: returnsteps + 1. - If
neighbour not in visited: add tovisitedandnext_front.
- If
front = next_front.steps += 1.
- If
- Return
0.
Worked example
begin = hit, end = cog
dict = [hot, dot, dog, lot, log, cog]
Buckets (excerpt): *ot -> [hot, dot, lot], h*t -> [hot], ho* -> [hot], *og -> [dog, log, cog], d*t -> [dot], do* -> [dot, dog], l*t -> [lot], lo* -> [lot, log], c*g -> [cog], co* -> [cog], ...
Single-end BFS:
(hit, 1)-> patterns*it, h*t, hi*. Bucketh*tyieldshot. Enqueue(hot, 2). No others.(hot, 2)-> patterns*ot, h*t, ho*.*otyieldsdot, lot. Enqueue both with 3.h*tis empty after the previous step.ho*is empty.(dot, 3)->*otempty.d*tyields nothing new (dotitself only).do*yieldsdog. Enqueue(dog, 4).(lot, 3)->lo*yieldslog. Enqueue(log, 4).(dog, 4)->*ogyieldslog, cog.cog == end. Return5.
Bidirectional BFS converges faster:
front = {hit},back = {cog}.steps = 1.- Smaller is either; expand
front. Neighbours{hot}.next_front = {hot}.steps = 2. - Now
back = {cog}smaller. Expandback. Patterns ofcog:*og->dog, log;c*g-> none new;co*-> none new.next_back = {dog, log}.steps = 3. - Now both are size 2. Expand
front = {hot}. Patterns:*ot->dot, lot. Neither inback.next_front = {dot, lot}.steps = 4. - Expand
back = {dog, log}. Fromdog:*og->log, cog(cog already in back).d*g-> none new.do*->dot.dotis infront. Returnsteps + 1 = 5.
Same answer, fewer cells touched. On larger graphs the gap is dramatic.
Complexity
- Bucket build:
O(M * L^2). - Single-end BFS:
O(M * L^2 + M * L * (avg bucket size)). In the worst caseO(M * L * average_branching), comfortable forM = 5000,L = 10. - Bidirectional BFS: same theoretical worst case, much smaller in practice.
- Memory:
O(M * L^2)for the bucket map plusO(M)for the visited set.
Common pitfalls
- Brute-force adjacency build:
O(M^2 * L)is the timeout. The wildcard-bucket transform is the saver. - Not pre-checking
endWordmembership: withendnot in the dictionary, the BFS exhausts the entire reachable set before returning0. The single-lineif end not in dict: return 0is a 100x speedup on adversarial inputs. - Forgetting
beginWord == endWord: the spec says "number of words in the sequence", and the shortest sequence is[begin]with one word. - Off-by-one on the return: the answer is the number of words in the path, which equals edges + 1. BFS depth counts edges; you return depth + 1.
- Failing to clear visited buckets in single-end BFS: each pattern bucket is walked once, then no longer useful; clearing it avoids re-walking and keeps the constant factor low.
- Bidirectional BFS without "always expand the smaller frontier": degrades to single-end BFS in the worst case and may even be slower than single-end because of the double bookkeeping.
- Treating
beginWordas a member of the dictionary by default: the spec says "begin may or may not be in the dictionary". The BFS starts atbeginregardless.
Where this shows up in the enterprise
- Google search "Did you mean": an edit-distance graph over the corpus is the implicit graph here. BFS depth = edit distance.
- Spotify recommendations: walking the song-similarity graph from a seed track to a target track and reporting the shortest connecting chain is the same algorithm on a vastly larger implicit graph.
- Slack channel membership (analogy): "shortest chain of channels through which user A and user Z are connected" runs the same way.
- Twitter/X feed graph: "fewest retweets connecting these two accounts" is BFS over the follow graph; bidirectional BFS is the production trick.
- Google Maps routing (when edge weights are uniform): "fewest turns" rather than "shortest distance" reduces to BFS. Bidirectional BFS is used at intersections and inside the rendering shortcut.
- DNA mutation distance: minimum number of point mutations between two genes; the graph is implicit (all single-letter substitutions), and the algorithm is bidirectional BFS over the bucket structure.
- Compiler optimisation passes: the smallest sequence of program-rewrite rules that takes one expression to another is a BFS over an implicit graph of rewrites.
- Bazel/Gradle "what is the smallest set of changes needed to flip my build from state A to state B": bidirectional BFS over a state graph.
- Network routing in mesh networks (Cloudflare Workers): "fewest hops" routing is BFS; bidirectional BFS is the optimisation used in large planet-scale meshes.
The pattern is implicit graph + shortest path + bidirectional BFS for scale.
Variants you'll see elsewhere
word-ladder-ii: return all shortest transformation sequences. The BFS is the same; you also need to reconstruct paths, which adds a layer of bookkeeping.minimum-genetic-mutation: same algorithm, fixed 8-character DNA strings.open-the-lock: same algorithm on 4-digit lock combinations; the implicit graph has 10,000 nodes.bus-routes: BFS over a graph where the nodes are stops and the edges are co-membership in a route.shortest-path-in-binary-matrix: BFS over a 2-D grid with 8-connectivity.
The unifying pattern: the graph is too big or expensive to materialise, so generate neighbours on the fly with a function. Bidirectional BFS halves the search depth wherever the graph is large.
How parikshan turns this into a learning loop
This is the cluster's flagship Hard problem. The recommended flow takes a week, not a day:
- Day 1, practice mode: brute-force adjacency + BFS. Submit. Watch it time out on the stress test. Internalise the bottleneck.
- AI training: Explain the wildcard-bucket pattern. The agent should articulate "two words are one-edit neighbours iff they share an
L-1-character pattern". - Day 2, practice mode: pattern-bucket BFS. Submit. It passes. Note the wall-clock.
- AI training: Explain bidirectional BFS. The agent should articulate the
b^dvs2 * b^(d/2)argument with numbers. - Day 3, practice mode: bidirectional BFS. Submit. It passes, and the wall-clock is 3-10x smaller than single-end. The number is the reward for the optimisation.
- Exam mode: redo from blank under 30 minutes. The integrity score on this problem is the cluster's gold standard.
The reason this problem is in the cluster is to force the lesson that the algorithm is not the bottleneck; the neighbour function is. That lesson recurs in every implicit-graph problem in your career.
In the AI-integrated workspace
When you direct an agent to write word-ladder, the failure modes range from "fine but slow" to "subtly wrong":
- The agent builds explicit adjacency with
O(M^2 * L): passes the public tests, times out on the stress test. Demand the wildcard-bucket optimisation by name. - The agent's bucket build is
O(M * L)instead ofO(M * L^2): it tries to deduplicate cleverly and ends up missing some neighbours. The correct build is straightforward: for each word, for each position, generate the pattern. - The agent's bidirectional BFS does not swap to the smaller frontier: degrades to single-end. Eyeball the inner loop.
- The agent forgets that
endWordmay not be in the dictionary: it BFS-exhausts the reachable set and returns0. Correct but slow. The pre-check is one line. - The agent's BFS depth counts edges and returns depth, not depth + 1: off by one on every input.
- The agent treats
beginWordas a dictionary member or treats it as having to be in the dictionary: the spec is explicit; the agent's interpretation is often wrong.
The audit checklist for this code, in 30 seconds:
- Is the neighbour function
O(L * bucket)per word, notO(M * L)? - Is bidirectional BFS swapping to the smaller frontier?
- Are the
endWord not in dictandbegin == endshort-circuits at the top? - Does the return value count words, not edges?
A reviewer who can run that audit in one breath is the reviewer who ships scalable implicit-graph search. This problem is hard not because the algorithm is hard but because the engineering taste required to choose the right optimisation is. parikshan's drill is to compress that taste into a 30-second reflex. When you can read a BFS over an implicit graph and immediately ask "where is the bucket and is it bidirectional?", you have arrived.