reverse-words-in-string

Difficulty: Medium · Topic: Two Pointers · Practice it on parikshan: /practice/reverse-words-in-string/start

The story

Twitter's text indexing pipeline, when it builds reverse search indices for trending phrases, needs the word order of a tweet reversed for one specific suffix index. The tweets come in from a Kafka topic with arbitrary whitespace from copy-paste: leading spaces, trailing spaces, double spaces between words, occasional tabs. The normaliser must produce a clean output: words in reverse order, exactly one space between them, no leading or trailing whitespace. This runs at hundreds of thousands of tweets per second across the indexer fleet, so every allocation matters.

That is reverse-words-in-string. It is two problems stapled together: (a) tokenise correctly under messy whitespace, and (b) emit the tokens in reverse with exactly one separator. The lesson is more about knowing your language's split semantics than about pointer mechanics.

What's actually being asked

Given a string s, reverse the order of the words. A word is a maximal run of non-space characters. Output requirements:

Words in reverse order.
Single space between consecutive words.
No leading or trailing whitespace.
Empty input or all-whitespace input produces an empty line.

The naive approach

Manual character scanning: walk the string, accumulate characters into a buffer, push the buffer onto a list when you hit a whitespace and the buffer is non-empty, reset the buffer; flush at the end; reverse the list; join with a single space. O(n) time, O(n) space. Correct, verbose, the right answer in a language without a smart split.

What people try first and get wrong: s.split(' ') (note the explicit single-space argument) returns a list with empty strings for every double space. You then have to filter them out. The argument-less s.split() does the right thing for free.

The key insight

str.split() in Python, called with no argument, has two crucial behaviours different from str.split(' '):

It splits on any run of whitespace (spaces, tabs, newlines), not a single space character.
It drops empty fields at the start, end, and between runs.

That is exactly the tokenisation this problem asks for. The whole solution collapses to:

print(' '.join(reversed(s.split())))

s.split() produces the clean word list. reversed(...) walks it backwards. ' '.join(...) emits exactly one space between consecutive words. print adds the trailing newline.

The two-pointer connection is more subtle here than in the rest of the cluster. The language-level split is implemented as a two-pointer scan that finds word boundaries; you are paying for that scan once, plus another O(n) for the join. The skill is in reaching for the standard library instead of hand-rolling a buggy scanner.

Step-by-step approach

The library version:

Read the line, strip the trailing newline.
Compute words = s.split().
Print ' '.join(reversed(words)).

The manual version (language-agnostic):

Walk s left to right with a buffer.
For each character: if it is not whitespace, append to the buffer; else if the buffer is non-empty, push the buffer onto a list and clear it.
After the loop, flush the buffer if non-empty.
Reverse the list. Print with a single space between elements.

The in-place double-reverse trick (only useful in mutable-string languages like C):

Reverse the entire string.
Walk through and reverse each word in place.
Compact multiple spaces and trim.

Mutable-string languages reward this trick because it is O(1) extra memory. Python strings are immutable so it does not help.

Worked example

s = "  hello world  "

s.split() → ["hello", "world"]. Leading/trailing/double spaces vanished.
reversed(...) → iterator yielding "world", then "hello".
' '.join(...) → "world hello".

Output: world hello.

For s = "a good example":

s.split() → ["a", "good", "example"].
Reversed and joined: example good a.

For s = " " (all whitespace):

s.split() → [].
' '.join(reversed([])) → "". Print an empty line.

Complexity

Time: O(n). One pass for split, one pass for join.
Space: O(n) for the word list and the joined output. Strings are immutable; you cannot do this in O(1) extra space in Python.

Common pitfalls

s.split(' ') with an explicit single-space argument. Returns ["", "hello", "world", ""] for " hello world ". You must then filter out empties. The argument-less form does this for you.
Stripping with s.strip() and then doing single-space split. Better, but still wrong on inputs with internal double spaces ("a b" → ["a", "", "b"]).
Reversing the string, not the word list. s[::-1] flips characters, not words. "the sky" becomes "yks eht", not "sky the".
Forgetting that the output of a reversed(...) call is an iterator. Calling ' '.join(reversed(words)) works because join accepts an iterable, but print(reversed(words)) would print the iterator's repr.
Hand-rolling and forgetting to flush the buffer at the end. Losing the last word is the most common manual-tokeniser bug.
Treating non-ASCII whitespace. The constraint here is ASCII-only, but in production str.split() also matches \t, \n, \r. Know that before you go to lunch.

Where this shows up in the enterprise

Twitter / X text indexing: cleaning whitespace and reversing word order for suffix-style queries.
Google Search query normalisation: tokenisation of raw query strings before stemming.
Slack message search: the same tokenise-and-clean step before passing to Lucene/Elasticsearch.
LinkedIn job-title parsing: extracting structured words from messy titles like " Senior Software Engineer ".
GitHub commit message linting: enforce single-spacing in commit-message bodies via the same normaliser.
Spotify lyrics search: per-line tokenisation before phrase matching.
Stripe Radar text features: cleaning merchant descriptors and notes before feature extraction.
WhatsApp message preview generation: clean whitespace and trim before rendering on a notification.

The pattern is "messy whitespace in, clean tokens out, then transform". The cleaning step is the load-bearing one and every text-processing system has it.

Variants you'll see elsewhere

reverse-string: reverse the characters, not the words.
reverse-words-in-string-ii: in-place version with mutable strings, the double-reverse trick.
reverse-words-in-string-iii: reverse each word individually while keeping word order, the opposite operation.
length-of-last-word: tokenise, return the length of the last token; same split instinct.
most-common-word: tokenise plus a Counter.
valid-anagram over words: tokenise plus multiset compare.

How parikshan turns this into a learning loop

After you read this article and click through to the practice link, the parikshan flow integrates four layers:

The editor runs your code against public tests for instant feedback and against per-session dynamic private tests (generated freshly per session). Synthetic cases hammer leading and trailing spaces, double spaces, single-word inputs, empty inputs, and all-whitespace inputs.
Hints are graduated. Level 1 splits the problem into "tokenise + reverse". Level 4 hands you the one-liner.
The AI training assistant in exam mode offers Explain, Review, Find Bugs, Add Comments, Optimise, each at an integrity cost. Practice mode is free. This is a great problem to ask the AI to compare s.split() vs s.split(' ').
The dispute flow runs after the verdict. The most common dispute is "my output looks right but has a trailing space". That is the spec rejecting your s.split(' ') version producing empty fields. Read the diff.

Read this concept, attempt in practice mode, then redo in exam mode. When you can write the one-liner reflexively and explain why split() differs from split(' '), the pattern is yours.

In the AI-integrated workspace

In your future job, the agent writes the code. Your job is to:

Recognise that the hard part of any "reverse the words" problem is tokenisation, not the reversal. The recognition lets you specify the right behaviour to the agent.
Tell the agent precisely: "use str.split() with no argument, then reverse the list, then ' '.join". Not "reverse the words", that may produce a single-space-split-then-filter mess.
Read the generated code and verify: no explicit ' ' argument to split, no leading or trailing space in the output, no per-character buffer scanner unless your language demands it.
Push the edge cases the agent will miss: empty input, all-whitespace input, single-word input with surrounding spaces, and inputs with tabs or non-breaking spaces if the spec allows them.

Candidates who reach for the right split overload in five seconds turn agents into leverage. Candidates who don't know the difference end up shipping [" ", " "].filter(x => x.length > 0) chains that are correct, slow, and embarrassing in review. parikshan is built to train the first kind.