detect-cycle

Difficulty: Medium · Topic: Graphs · Practice it on parikshan: /practice/detect-cycle/start

The story

A platform team owns the service-dependency tracker. A new deploy reaches production and an alert fires: "deadlock detected, service A blocked waiting for service B blocked waiting for service A". The on-call engineer pulls up the service map. They need to know, in seconds, whether the live dependency graph contains a cycle and which services are inside it. The recovery action depends on the answer.

This is detect-cycle. It is the structural twin of course-schedule, separated from it because cycle detection is a building block that recurs in places without prerequisites: dependency injection containers (Spring, Dagger), npm and pip resolvers, kubernetes Pod scheduling order, AWS CloudFormation rollouts, Datadog service maps, Bazel and Gradle build DAGs, and every distributed deadlock detector ever written. Treating cycle detection as its own pattern, with its own canonical implementation, is the industrial-strength move.

The cluster guidance is to spend extra detail here. The reason: the three-colour DFS pattern is the foundation for Tarjan's strongly-connected components, articulation-point detection, bridge-finding, and dominator-tree construction. Internalising the three colours is a force multiplier across every advanced graph topic.

What's actually being asked

A directed graph on n nodes with adjacency list adj. Print true if the graph contains any cycle (including self-loops); false otherwise. Parallel edges do not constitute a cycle by themselves.

The naive approach

Try every node. For each, run a DFS that records the current path. If during DFS you revisit a node already on the path, declare a cycle. Without the "current path" distinction (i.e., only tracking visited), the algorithm gives wrong answers: a node u reached via two different forward paths is not a cycle, yet two-state visited tracking mis-reports it.

That is the entire reason three colours exist. Two states is insufficient; three is the minimum that distinguishes "on the current stack" from "fully explored".

The key insight

Mark each node with one of three colours:

WHITE (0): unvisited.
GREY (1): currently on the DFS stack (we entered it but have not finished exploring its descendants).
BLACK (2): fully explored (we entered and exited; all descendants are also BLACK).

DFS from each WHITE node. On entry to a node, paint it GREY. Examine each neighbour:

GREY neighbour: this is a back edge. The neighbour is on the current path, so an edge from us to it closes a cycle. Return TRUE immediately.
WHITE neighbour: recurse.
BLACK neighbour: the neighbour was visited and exited cleanly. The edge to it is a forward edge or cross edge. Not a cycle. Move on.

When all of a node's neighbours are exhausted, paint it BLACK and return.

The crucial discrimination is GREY vs BLACK. A two-state scheme cannot tell them apart, so it flags forward and cross edges as cycles. That is why "visited / unvisited" cycle detection is wrong on directed graphs. (On undirected graphs the rules are different and two-state plus parent tracking suffices; this problem is directed.)

Why GREY != BLACK matters, spelt out: imagine a graph 0 -> 1, 0 -> 2, 1 -> 2. DFS from 0 paints 0 GREY, enters 1 (GREY), enters 2 (GREY then BLACK), returns to 1 (BLACK), returns to 0. Now 0 examines its second neighbour 2. With two-state visited tracking, 2 is "visited" and the algorithm declares a cycle. WRONG: this is a DAG. With three colours, 2 is BLACK and the algorithm correctly skips it.

That single example is the strongest argument for three colours that exists. Internalise it and you will never write the two-state bug.

Step-by-step approach

Parse n and adj (JSON array).
color = [0] * n, with 0=WHITE, 1=GREY, 2=BLACK.
For each v in 0..n-1 with color[v] == 0:
- Push (v, 0) onto a manual stack (the 0 is the next-neighbour-index). Set color[v] = 1.
- While the stack is non-empty:
  - Peek the top frame (u, i).
  - If i < len(adj[u]):
    - w = adj[u][i]. Advance the frame's i by 1.
    - If color[w] == 1: return TRUE (back edge to a GREY node = cycle).
    - If color[w] == 0: set color[w] = 1, push (w, 0).
    - If color[w] == 2: skip (forward or cross edge).
  - Else (frame exhausted): set color[u] = 2, pop the frame.
If no cycle was ever returned, print 'false'. Otherwise print 'true'.

The iterative version is mandatory: a 10,000-node chain blows Python's default recursion limit. The frame stores (node, next_neighbour_index) so we can resume neighbour iteration after a recursive descent returns.

Worked example

n = 3
adj = [[1], [2], [0]]

This is the cycle 0 -> 1 -> 2 -> 0.

All WHITE. Outer loop picks 0. Set color[0] = GREY. Stack [(0, 0)].
Peek (0, 0). Neighbour adj[0][0] = 1. Advance to (0, 1). color[1] = 0 (WHITE) -> set color[1] = GREY, push (1, 0). Stack [(0, 1), (1, 0)].
Peek (1, 0). Neighbour 2. Advance to (1, 1). color[2] = 0 -> set color[2] = GREY, push (2, 0). Stack [(0, 1), (1, 1), (2, 0)].
Peek (2, 0). Neighbour 0. color[0] = GREY. Return TRUE.

Output 'true'. The self-loop [[0]] works the same way: enter 0, paint GREY, examine neighbour 0, GREY, cycle.

A DAG example: adj = [[1], [2], [3], []]. 0 -> GREY, push. Neighbour 1 WHITE -> GREY, push. Neighbour 2 WHITE -> GREY, push. Neighbour 3 WHITE -> GREY, push. 3 has no neighbours -> BLACK, pop. Back to 2. Iterator exhausted -> BLACK, pop. Back to 1. Iterator exhausted -> BLACK, pop. Back to 0. Iterator exhausted -> BLACK, pop. Outer loop sees no more WHITE. Return FALSE.

Complexity

Time: O(V + E).
Space: O(V) for colours and stack.

Common pitfalls

Two-state visited tracking on a directed graph: incorrectly flags forward and cross edges as cycles. The three-colour scheme is the canonical fix.
Recursive DFS on a 10,000-node chain: Python's default recursion depth (~1000) gets exceeded. The iterative version with (node, neighbour_index) frames is required.
Marking BLACK on entry instead of on exit: defeats the whole point. BLACK must mean "all descendants are also done", which is only true after the recursion returns.
Forgetting the outer scan over all n nodes: a single DFS from node 0 misses cycles in disconnected components.
Treating parallel edges as cycles: two copies of u -> v are still just one edge for cycle purposes. The algorithm naturally handles this because the second visit to BLACK v is a no-op.
Not considering self-loops: a self-loop is a back edge from GREY to itself. The algorithm catches it; do not special-case it.

Where this shows up in the enterprise

Bazel and Gradle build DAGs: cycle detection on dependency targets before scheduling the build.
npm and pip dependency cycles: refuse to install when the dependency graph cycles.
AWS CloudFormation dependency: stack creation aborts when DependsOn cycles are detected.
Spring, Dagger, and Guice DI containers: report a "circular dependency" error when bean construction graph cycles; that report is three-colour DFS in production.
Datadog service maps: detect call cycles between microservices and surface them as design warnings before they become production deadlocks.
kubernetes Pod scheduling: detect initContainer dependency cycles before scheduling.
Distributed deadlock detection: when transactions wait on each other, the "wait-for graph" cycle is a deadlock; the detector is three-colour DFS over the wait-for graph.
Compiler imports: detecting a circular import in Python (import a which imports b which imports a) is essentially three-colour DFS over the module graph.
Database referential integrity: enforcing acyclic foreign-key references in a schema migration is cycle detection on the FK graph.
CI/CD pipeline definitions: GitHub Actions and CircleCI refuse a job dependency cycle.

The pattern "structural integrity check before doing the thing" is one of the most-shipped pieces of code in working software.

Variants you'll see elsewhere

course-schedule: same algorithm, presented as "can the student finish?".
course-schedule-ii: cycle detection plus topological order.
find-eventual-safe-states: nodes that are not on a cycle and reach no cycle. Reverse graph + topological sort, or three-colour DFS with a "result-is-known" memo.
strongly-connected-components (Tarjan, Kosaraju): three-colour DFS plus a low-link / second-pass mechanism.
bridges-and-articulation-points: three-colour DFS plus a "discovery time" mechanism.

Three-colour DFS is the foundation. The variants add bookkeeping; the colours stay the same.

How parikshan turns this into a learning loop

This is the cluster's "industrial pattern in 30 lines" problem. The recommended flow:

Practice mode, no AI: write recursive three-colour DFS first. The mental model "GREY = on stack, BLACK = done" is what you need.
Submit. If your tests include a 10,000-node chain, watch it crash with RecursionError.
AI training: Explain how to convert the recursion to an iterative stack with (node, neighbour_index) frames. The agent should walk through the resume-after-descent mechanic.
Practice mode: iterative version. Submit, pass.
AI training: Find Bugs on a wrong two-state version of yours. The agent should call out the GREY/BLACK conflation.
Exam mode: redo from blank under 20 minutes.

The two-state-is-wrong lesson is the highest-leverage one in the cluster. Internalise it through this problem and you will never ship the bug.

In the AI-integrated workspace

When you direct an agent to write directed cycle detection, the failure modes are exactly the pitfalls above:

The agent uses a two-state visited set: the code looks clean and is wrong. It flags forward and cross edges as cycles. The public tests (with no forward edges) pass; the private DAG-with-many-edges test fails.
The agent uses recursive DFS: passes the small tests, blows the stack on the 10,000-node chain. Demand iterative.
The agent skips the outer loop over all nodes: misses cycles in disconnected components.
The agent paints BLACK at the wrong moment: paints on entry instead of exit. Equivalent to two-state. Wrong.
The agent uses Kahn's algorithm and emits true / false: that is actually correct and arguably easier to write. The trade-off is that Kahn cannot tell you which nodes are in the cycle, while three-colour DFS can (the GREY set at the moment of detection is the cycle prefix). For "just yes/no", either works. For "give me the cycle", three-colour DFS wins.

The 30-second audit on cycle-detection code:

Are there three states or two? Three is correct on directed graphs.
Is the BLACK transition on exit, not on entry?
Is there an outer scan over all nodes?
Is the DFS iterative for large inputs?

Four yeses and the code is correct. A reviewer who can run that audit in one breath is the reviewer whose DI containers, build systems, and migration runners do not deadlock in production. That is the entire skill. parikshan's job is to compress it into your fingers; this problem is where it happens.

The reason this problem matters beyond the immediate cluster: three-colour DFS is the entry point into Tarjan's SCC algorithm, articulation-point detection, dominator-tree construction, and the entire family of "DFS with bookkeeping" techniques. Every one of them starts from "white, grey, black". Internalise that here and the next decade of graph problems opens up at a discount.