course-schedule

Difficulty: Medium · Topic: Graphs · Practice it on parikshan: /practice/course-schedule/start

The story

The registrar of a large university hands you a list of courses and their prerequisites. The dean wants a one-line answer before the next senate meeting: can a student, in principle, complete every course offered? Or has someone, somewhere in the catalogue, accidentally written a rule that says course A requires B which requires A?

That question is a cycle-detection query on a directed graph. The same algorithm runs when npm checks for dependency cycles before publishing a package, when Bazel and Gradle decide whether a build graph is well-formed, when Make refuses to run because target X depends on target Y which depends on X, and when AWS CloudFormation rejects a stack because two resources reference each other circularly. It is, alongside topological sort, the most-shipped graph algorithm in working software.

What's actually being asked

n courses labelled 0..n-1. m rules of the form u v meaning "course u requires course v". Print true if there exists an order in which every course can be completed, false if the prerequisites form a cycle. Self-loops count as cycles; parallel edges do not.

The naive approach

Try every permutation of courses, check whether each is a valid order. O(n!). For n = 20 that is 2.4 quintillion. Useless. Graph traversal solves it in O(n + m).

The key insight

Every prerequisite rule u v is a directed edge from v (prerequisite) to u (dependent). A valid completion order exists if and only if this directed graph has no cycle, i.e., it is a DAG. So the question reduces to directed cycle detection.

Two canonical algorithms solve this in O(n + m). Both belong in your fingers; you will use them in your career more times than every other graph algorithm except plain BFS.

Kahn's algorithm (BFS-style, in-degree counting)

Compute the in-degree of every node. Seed a queue with every node of in-degree 0 (a "source"). Repeatedly: dequeue a node, count it as "taken", decrement the in-degree of each successor, enqueue successors that just dropped to in-degree 0. When the queue empties, you have taken some prefix of the nodes. If the taken count equals n, the graph is a DAG and the order in which you popped is a valid topological sort. If the taken count is less than n, the un-taken nodes form a sub-graph where every node has at least one un-taken predecessor, a cycle.

Three-colour DFS

Mark each node white (unseen), grey (currently on the DFS stack), or black (finished). DFS from each white node: paint grey on entry, recurse on every neighbour, paint black on exit. If during recursion you encounter a grey neighbour, that is a back edge that closes a cycle, return immediately. A black neighbour is a forward or cross edge and is safe. White neighbours recurse.

Both run in O(n + m). Kahn is iterative and easy to write under time pressure; three-colour DFS is the foundation of every more sophisticated graph algorithm (Tarjan's SCC, dominator trees, articulation points). Master both.

Step-by-step approach

Read n, m. Initialise indeg = [0] * n, adj = [[] for _ in range(n)].
For each rule u v: adj[v].append(u), indeg[u] += 1.
Initialise a deque with every i where indeg[i] == 0.
seen = 0.
While the deque is non-empty:
- Pop u. seen += 1.
- For each w in adj[u]: indeg[w] -= 1. If indeg[w] == 0, append w.
Print 'true' if seen == n, else 'false'.

Step-by-step approach (three-colour DFS, iterative)

color = [0] * n with 0=white, 1=grey, 2=black.
For each v in 0..n-1 with color[v] == 0:
- Push (v, iter(adj[v])) onto a manual stack, set color[v] = 1.
- While the stack is non-empty:
  - Peek the top frame (u, it). Try to advance the iterator.
  - If next neighbour w: if color[w] == 1 return 'true' (cycle); if color[w] == 0, push (w, iter(adj[w])), set color[w] = 1; if color[w] == 2, skip.
  - If iterator exhausted: set color[u] = 2, pop the frame.
If no cycle ever fired, print 'false' (the 'true'/'false' mapping is "is there a cycle?" inverted to "can the student finish?"; the problem asks the latter, so flip).

Note the flip: the algorithms detect cycles (truthy means a cycle); the problem returns 'true' for "no cycle". Match the spec at the print site.

Worked example

n = 4, m = 4
1 0
2 0
3 1
3 2

Adjacency from prerequisite to dependent: adj = [[1,2], [3], [3], []]. In-degrees [0, 1, 1, 2].

Initial queue: [0].
Pop 0. seen = 1. Decrement indeg[1] -> 0, indeg[2] -> 0. Queue: [1, 2].
Pop 1. seen = 2. Decrement indeg[3] -> 1. Queue: [2].
Pop 2. seen = 3. Decrement indeg[3] -> 0. Queue: [3].
Pop 3. seen = 4. Queue empty.
seen == n, print 'true'.

A cycle example would be 1 0 and 0 1. Initial in-degrees both 1. The queue starts empty. The loop terminates immediately with seen = 0 < 2. Print 'false'.

Complexity

Time: O(n + m).
Space: O(n + m) for the adjacency, O(n) for the in-degree array and the queue.

Common pitfalls

Edge direction confusion: "course u requires course v" parses to an edge v -> u, not u -> v. Get this backwards and the in-degree array is the in-degree of the wrong nodes.
Self-loops counted as DAGs: a self-loop u u adds 1 to indeg[u] with no offsetting decrement, so u never enters the queue, and Kahn's correctly returns 'false'. The three-colour DFS sees u grey then sees itself grey on the first neighbour step.
Parallel edges counted as cycles: two copies of u v add 2 to indeg[u] and 2 entries to adj[v]. Both decrement when v is popped. Output is unchanged.
Recursive DFS on a 1000-node chain: Python's default recursion depth is 1000. Iterative DFS or sys.setrecursionlimit(...) is required.
Forgetting to scan every node (DFS path): a single DFS from node 0 misses disconnected components. The outer scan over 0..n-1 covers them.

Where this shows up in the enterprise

Bazel and Gradle build DAGs: dependency cycle detection on build targets is exactly this algorithm. A failed build with "cycle detected: A -> B -> A" is Kahn's or three-colour DFS.
npm and pip dependency cycles: package managers refuse to install when the resolution graph has a cycle.
AWS CloudFormation dependency: stack creation aborts when resource A DependsOn B which DependsOn A.
Spreadsheets (Excel, Google Sheets): cell A1 referencing B1 referencing A1 produces a "circular reference" error, which is this algorithm running on the formula DAG.
Datadog service maps: detecting a service-call cycle at design time prevents production deadlocks.
kubernetes Pod scheduling: detecting an initContainer dependency cycle before scheduling.
Make: refuses to build when target.o depends on target.c which depends on target.o.
Database migrations: a forward-only schema-migration framework runs Kahn's on the migration DAG to compute the order. Cycle = corrupt migration history.
University catalogue tools (Banner, PeopleSoft): the literal university registrar story from the top.

The pattern "dependency manager refusing a circular plan" is one of the most-shipped pieces of code on the planet.

Variants you'll see elsewhere

course-schedule-ii: not just "can it be done" but "in what order"; Kahn returns the order; on a cycle, return empty.
alien-dictionary: derive the prerequisite edges from string comparisons, then run topological sort.
parallel-courses: same DAG, return the number of semesters if you can take any number of in-degree-0 courses in parallel.
detect-cycle (directed graph, standalone): the cycle-detection half of this problem alone; either Kahn or three-colour DFS.
find-eventual-safe-states: reverse the graph, return all nodes not on a cycle and not reaching one.

All of them: directed graph, in-degree counting or three-colour DFS, the same core in 30 lines of Python.

How parikshan turns this into a learning loop

This is the cluster's first "two valid algorithms, pick one and master both" problem. Recommended flow:

Practice mode, no AI: write Kahn's from scratch. Target: under 15 minutes.
Practice mode again: write three-colour DFS from scratch, iteratively. Target: under 20 minutes (recursion-to-iteration is the hard part).
AI training: Explain the equivalence. The agent should articulate why a topological order exists iff the graph is a DAG iff no back edge exists in DFS.
Exam mode: solve under 10 minutes with your preferred algorithm.

The agent's explanation of the equivalence is the high-leverage learning moment. The mechanics of either algorithm are 30 lines; the conceptual link "no cycle = in-degree-0 chain exists = topological order exists = no back edge in DFS" is the part that compounds across every graph problem you ever touch.

In the AI-integrated workspace

When you direct an agent to detect dependency cycles, the failure modes are predictable:

The agent reverses edge direction: "course u requires course v" is interpreted as u -> v instead of v -> u. The cycle answer is unaffected (a cycle in G is a cycle in G^reverse) but topological sort produces the wrong order. Audit the parsing step.
The agent uses recursive DFS on inputs the spec allows but the agent's tests do not exercise: passes the public tests, crashes on production input.
The agent forgets the outer scan over disconnected components: a single DFS from node 0 returns "DAG" on a graph whose only cycle is in a disconnected component.
The agent uses a visited set with two states instead of three colours: cannot detect back edges; only detects whether a node was visited at all. The DFS happily walks a cycle and returns "DAG".

A reviewer who can audit edge direction in five seconds and demand three colours (or in-degree counting) for cycle detection is the reviewer whose builds, packages, and deploys do not silently corrupt. parikshan's drill is to make these audits automatic. The cycle-detection skill returns on investment every time a CI build refuses a circular import; you will have written those refusals correctly by then.