pacific-atlantic-water-flow

Difficulty: Medium · Topic: Graphs · Practice it on parikshan: /practice/pacific-atlantic-water-flow/start

The story

A hydrology client gives you a digital elevation model of a rectangular island. Water on a cell flows to any 4-neighbour of equal or lower height. The top and left edges drain into the Pacific; the bottom and right edges drain into the Atlantic. Insurance underwriters want every cell from which water can reach both oceans. Those cells form the "split watershed", and parametric flood policies key off it.

The naive solver runs a BFS from each cell asking "does this reach both oceans?". Three orders of magnitude too slow. The article you are reading is about the inversion trick that takes the problem from O((r·c)^2) to O(r·c). It is one of the most beautiful re-framings in the graph cluster.

What's actually being asked

Given an r × c matrix of heights, print every (i, j) from which water can reach both the Pacific border (row 0 or column 0) and the Atlantic border (row r-1 or column c-1). Water flows to a neighbour of equal or lower height. Output rows in row-major order.

The naive approach

For every cell (i, j), run a BFS that follows "flow downhill" edges and check whether it reaches both ocean borders. That is r·c BFS sweeps, each O(r·c). Total O((r·c)^2). On a 200x200 grid that is 1.6 * 10^9 operations: minutes in Python. Useless.

The key insight

Reverse the flow. Instead of asking "from each cell, where can water reach?", ask "from each ocean, which cells can the ocean reach if water flowed UPHILL?". A single ocean's reachable set is computed with one multi-source BFS seeded with every border cell of that ocean. Two such BFS sweeps, one per ocean, produce two boolean reachability matrices. The answer is their intersection.

Why this works: a cell X can flow forward to the Pacific iff there exists a downhill (or flat) path from X to some Pacific border cell. Reverse every edge in that path and you have an uphill (or flat) path from a Pacific border cell to X. The two questions are dual. The forward formulation has r·c starting points and one target shape; the reverse formulation has one set of starting points and r·c candidate targets. The reverse runs once per ocean, not once per cell.

There is one subtlety that traps every newcomer: the comparison flips from "neighbour height <= me" (forward) to "neighbour height >= me" (reverse). The = part of >= is not optional. Water flows across flat ground in both directions; the ocean must therefore reach equal-height neighbours when walking inward. Using strict > silently kills the all-flat case.

Step-by-step approach

Parse the height grid r × c.
Allocate pac[r][c] and atl[r][c], both boolean, all False.
Multi-source BFS for the Pacific:
- Seed a deque with every cell of row 0 and every cell of column 0. Mark them in pac.
- Pop (x, y), for each 4-neighbour (nx, ny) that is in bounds, not yet in pac, and has height grid[nx][ny] >= grid[x][y]: mark pac[nx][ny] = True and enqueue.
Multi-source BFS for the Atlantic, identical, seeded with row r-1 and column c-1, marking atl.
Iterate the grid in row-major order; emit i j for every (i, j) where pac[i][j] and atl[i][j].
Output one i j per line.

Worked example

Pacific reverse-BFS seeds row 0 and column 0. From (0,4)=5, it cannot climb to its right neighbour (out of grid) but column 0 cells march downward. From (2,2)=5 it is reached via the chain (0,0) -> (1,0)=3 -> (2,0)=2 -> (2,1)=4 -> (2,2)=5, all non-decreasing.

Atlantic reverse-BFS seeds row 4 and column 4. The cell (2,2)=5 is reached via (4,4)=4 -> (3,4)=5 -> (3,3)=4, wait, that decreases. Try (4,4)=4 -> (4,3)=2, no, that decreases too. The correct chain is (0,4)=5 -> (1,4)=4 -> (1,3)=4 -> (2,3)=3, all going inward with non-strictly-increasing height. The cells that satisfy both flags are exactly the seven listed in the problem statement: 0 4, 1 3, 1 4, 2 2, 3 0, 3 1, 4 0.

The point of the worked example is not the bookkeeping; it is to convince you that the reverse-BFS frontier expands inward correctly when the comparison is >=.

Complexity

Time: O(r · c). Two BFS sweeps, each visiting every cell at most once, plus a final scan.
Space: O(r · c) for two reachability matrices and the BFS queues.

For a 200x200 grid that is 40,000 operations per sweep. Sub-millisecond.

Common pitfalls

Using strict > instead of >=: kills every flat-plateau test case, including the trivial all-equal grid. Water flows across flat ground both ways; the ocean's reverse BFS must climb equal-height edges.
Single-source BFS from each border cell: equivalent in correctness, catastrophic in performance. Seed the queue with every border cell before the first pop; that is the multi-source BFS optimisation.
Marking on pop instead of on enqueue: pushes the same cell through the queue many times via different parents. Mark on enqueue.
One reachability matrix shared between oceans: the two passes must keep separate flags. The intersection is computed at the end.
Output via print() per line on a 40,000-line grid: 40,000 syscalls is slow. Use sys.stdout.write('\n'.join(...)).

Where this shows up in the enterprise

Insurance flood modelling (Munich Re, Swiss Re): computing the watershed split is a direct application; the algorithm runs over high-resolution DEMs from the USGS or Copernicus.
Google Maps routing for hydrological networks: which lakes drain to which river system is a multi-source BFS over an elevation graph.
Disaster response triage at NASA AID and Red Cross: dual-watershed cells flood from either side, so they are higher-risk than single-watershed cells of the same elevation.
Cloudflare attack-traffic clustering (analogy): when a multi-source botnet attacks from many points, asking "which targets are reachable from both Source-Set-A and Source-Set-B?" is the same intersection-of-multi-source-BFS pattern.
Spotify recommendations: starting from a set of seed tracks and walking the similarity graph outward to find "tracks reachable from both my workout playlist and my chill playlist" is the same algorithm on a non-grid graph.
AWS CloudFormation rollback planning: starting from two failure points and finding the intersection of affected resources is the same intersection-of-reverse-traversals pattern.
Twitter/X feed graph: "users reached by retweet from both source clusters" is multi-source BFS intersection over the follow graph.

The pattern "two simultaneous expansions, intersect the reach sets" is the take-away, not the elevation grid.

Variants you'll see elsewhere

surrounded-regions: one reverse BFS from the border; mark "reaches border", then flip everything that doesn't.
walls-and-gates: multi-source BFS from all gates, propagating shortest distance into rooms.
rotting-oranges: multi-source BFS from rotten cells; same template, different stopping rule.
01-matrix: multi-source BFS from every zero, computing the distance to the nearest zero for every cell.
as-far-from-land-as-possible: multi-source BFS from all land, return the maximum distance from a water cell.

The pattern in all of them: seed many sources, expand outward, mark on enqueue. Master pacific-atlantic-water-flow and you have all of them.

How parikshan turns this into a learning loop

This is the first problem where the "reverse the question" insight is more valuable than the implementation. Recommended flow:

Practice mode, no AI: attempt the forward formulation. Watch it time out on the 200x200 stress test.
AI training: Explain with the prompt "is there a way to avoid running BFS from every cell?". The agent should suggest reversing.
Practice mode again, this time with the reverse formulation. Write the multi-source BFS carefully and run it.
Exam mode: redo from blank. If the inversion is in your fingers, this takes under ten minutes.

The (forward, reverse) framing is one of the highest-leverage tools in graph problems. Internalising it through this problem will pay off when you next see "from each X, can it reach a Y?". Reverse it. Almost always faster.

In the AI-integrated workspace

When you direct an agent to write this, the first cut will almost always be the forward formulation. Your review must include:

The complexity check: O((r·c)^2) or O(r·c)? Make the agent answer this in writing.
The >= vs > audit: a single character difference that erases every flat-plateau test case.
The multi-source seeding: was every border cell pushed before the first pop?
The mark-on-enqueue invariant: marking only on pop is the classic BFS bug.
The output assembly: \n.join vs. 40,000 print calls.

A senior engineer reviewing this kind of code looks at the BFS comparator first and the seeding second. Both are one-line bugs that pass the public tests and fail the private tests. parikshan's exam mode runs both, which is why the integrity score on this problem is a strong predictor of whether the candidate has actually internalised multi-source BFS or just memorised the template.