roman-to-integer

Difficulty: Easy · Topic: Math, Matrix & Heaps · Practice it on parikshan: /practice/roman-to-integer/start

The story

A museum's digital archive at the Getty Center holds thousands of catalogue cards going back to the 1880s. Many cards still record dates and edition numbers in Roman numerals: MCMXCIV for 1994, MMXXV for 2025, CDXLIV for the 444th item in a series. The archive team needs to migrate this data into a modern database where dates and indices are integers. They ship you the strings; you ship back the integers.

That is roman-to-integer. It is also the toy version of every single-pass parser you will write: read one character at a time, decide its meaning from a tiny window of surrounding context, accumulate into a running result.

What's actually being asked

Given a string s of characters from I V X L C D M, convert it to its integer value. The catch is the six subtractive pairs: IV = 4, IX = 9, XL = 40, XC = 90, CD = 400, CM = 900. Everywhere else, characters add their face value.

The naive approach

Hard-code the six pairs. For every position, check whether the current two characters form IV, IX, XL, XC, CD, or CM; if so, add the pair value and skip two; otherwise add the single character's value and skip one. Correct, but six special cases in code is a code-review smell. Worse, it does not teach the general parsing pattern: peek at one character of context and decide locally.

The key insight

The six subtractive pairs all obey one simple rule: a symbol is subtracted if it sits immediately to the left of a strictly larger symbol; otherwise it is added. I before V (5) is subtracted: 5 - 1 = 4. X before C (100) is subtracted: 100 - 10 = 90. The local rule replaces the enumeration.

The cleanest implementation walks the string from right to left, comparing each symbol's value to the previously seen one. Right-to-left turns "is the next symbol larger?" (forward look-ahead, fiddly at the end) into "is the previous symbol larger?" (a single variable, no boundary check).

Step-by-step approach

Map each Roman symbol to its integer value: {I:1, V:5, X:10, L:50, C:100, D:500, M:1000}.
Initialise total = 0 and prev = 0.
Iterate over s reversed. For each character's value v:
- If v < prev, subtract: total -= v.
- Else, add: total += v.
- Update prev = v.
Return total.

Worked example

s = "MCMXCIV"   (expected 1994)

Reverse-walk:

ch='V', v=5, prev=0, 5 >= 0, total = 5, prev = 5.
ch='I', v=1, prev=5, 1 < 5, total = 5 - 1 = 4, prev = 1.
ch='C', v=100, prev=1, 100 >= 1, total = 4 + 100 = 104, prev = 100.
ch='X', v=10, prev=100, 10 < 100, total = 104 - 10 = 94, prev = 10.
ch='M', v=1000, prev=10, 1000 >= 10, total = 94 + 1000 = 1094, prev = 1000.
ch='C', v=100, prev=1000, 100 < 1000, total = 1094 - 100 = 994, prev = 100.
ch='M', v=1000, prev=100, 1000 >= 100, total = 994 + 1000 = 1994.

Result: 1994. The right-to-left walk handled CM, XC, and IV without any special casing.

Complexity

Time: O(n), a single pass over the string.
Space: O(1), a fixed-size value map and two integers.

For input bounded at 15 characters by the problem, this is essentially constant time. The complexity argument matters more as a pattern for longer parsers.

Common pitfalls

Special-casing six pairs: works, but the value-comparison rule is one branch instead of six. Code review will ask why you wrote it the long way.
Forward walk without bounds check: if you peek at s[i+1] from s[i], you must special-case i = n-1. The reverse walk eliminates this.
Validating input: the problem promises well-formed Roman numerals. Do not add validation that the problem does not ask for; that is engineering nervousness leaking into the solution. (In a museum-archive context, you would validate, but in a separate pass.)
Returning a string: read the spec. The output is the integer.

Where this shows up in the enterprise

The peek-ahead pattern is the entire foundation of single-pass parsers:

Compiler lexers (TypeScript, Rust, Go): read one character, decide whether it starts a multi-character token by peeking one ahead. = versus == versus === is exactly the same shape as I versus IV.
JSON and YAML parsers: decide whether a - begins a negative number or a list marker by looking at the next character.
URL routers (Express, Next.js): match path segments against route patterns one segment at a time, with one segment of look-ahead for parameter binding.
Date parsers (date-fns, Java's DateTimeFormatter): decide whether 01 is the start of a year or a day from neighbouring format hints.
Encoding decoders (UTF-8, Base64): the first byte of a UTF-8 sequence tells you how many follow. You peek the leading bits and consume the right number.
Configuration interpreters (HCL in Terraform, Nginx config): every directive parses with one-token look-ahead.

The Roman numeral problem is the smallest possible introduction. Once you have it, the leap to a real lexer is mostly bookkeeping.

Variants you'll see elsewhere

integer-to-roman: the reverse problem. Greedy: repeatedly take the largest symbol or subtractive pair that fits.
string-to-integer (atoi): parse a signed integer from a string, handling whitespace, sign, overflow, and trailing garbage. Same peek-and-decide rhythm.
valid-parentheses: stack-based parser, but the spirit (consume a character, decide its meaning from local context) is the same.
decode-ways: a digit-string parsing problem with two-character look-ahead, solved with dynamic programming on top of the same idea.

How parikshan turns this into a learning loop

After you read this article and click through to the practice link, the parikshan flow integrates four layers:

The editor runs your code against public tests and per-session dynamic private tests generated fresh, so a memorised solution does not pass. The dynamic suite for roman-to-integer covers single symbols, maximum-value MMMCMXCIX, and stacked subtractives like CDXLIV and CDXCIX.
Hints are graduated, with explicit penalties on each. The first hint nudges you toward thinking about subtractive cases; the third hands you the value-comparison rule; the fourth is the full skeleton.
The AI training assistant can Explain, Review, Find Bugs, Add Comments, or Optimise. Each costs integrity points in exam mode and is free in practice mode. Use practice mode to learn the pattern, exam mode to prove it.
The dispute flow sends an output mismatch you disagree with to a human reviewer who can override the verdict.

Solve in practice with the AI as a tutor. Then do it cold in exam mode under three minutes. That is the bar.

In the AI-integrated workspace

When you eventually build a real parser at work, an agent will write most of the code. Your edge is recognising the shape:

Spot that the spec describes a single-pass parse with at most one character of look-ahead. Tell the agent so.
Ask explicitly for the reverse walk if it simplifies the code; otherwise insist on the bounds check on the forward walk.
Read the agent's output and check for special cases that should not be there. Six hard-coded branches when one comparison suffices is a smell.
Stress-test the corners: single character, maximum length, stacked subtractives.

Engineers who can compress six special cases into one comparison are the ones who keep code reviewable as projects grow. Engineers who cannot end up maintaining the six-branch version forever.