remove-duplicates-from-sorted-array

Difficulty: Easy · Topic: Two Pointers · Practice it on parikshan: /practice/remove-duplicates-from-sorted-array/start

The story

ClickHouse stores each table's data as a sorted run on disk. After a routine merge, the engine sometimes finds duplicate primary-key values that the ReplacingMergeTree policy must collapse to a single record. The merge is in place, the buffer is already exactly the size of the sorted run, with duplicates sitting next to their twins because of the sort. The task: compact the duplicates out so each key appears once, leaving the unique values at the front of the buffer and returning the new length. No extra allocations; the buffer cannot grow.

That is remove-duplicates-from-sorted-array. The two-pointer slow/fast trick is the workhorse of every in-place compaction problem, and this is the cleanest place to learn it.

What's actually being asked

Given a sorted (non-decreasing) integer array nums, remove duplicates in place so each value appears at most once. Print:

on the first line, k, the count of unique values;
on the second line, the k unique values, space-separated.

The naive approach

Build a fresh list, appending each value only if it differs from the last appended one. O(n) time, O(n) space. Easy. Wastes memory you did not need to waste.

A more obvious wrong approach: shift the array left every time you find a duplicate. That is O(n²) and times out on dense duplicates.

The key insight

The array is sorted. All copies of a value are adjacent. So you do not need a hash set, a Counter, or a fresh list. You only need to know: "is this current value different from the last value I have already accepted?" A single comparison.

Two pointers, slow/fast flavour:

write (the slow one), points to the slot where the next accepted unique value should land. Its invariant: nums[0..write-1] are the unique values found so far, in order.
read (the fast one), scans left to right.

For each read, if nums[read] != nums[write - 1], copy nums[read] into nums[write] and advance write. If they are equal, skip; the value is already represented by nums[write - 1].

The first element is trivially unique (the array is non-empty under the constraints), so initialise write = 1 and start scanning from read = 1.

Step-by-step approach

If n == 0, print 0 then an empty line. (Not possible given n >= 1 constraints, but defensive code is cheap.)
Set k = 1. The first element of the result is always nums[0].
For i from 1 to n - 1:
- If nums[i] != nums[k - 1], copy nums[i] into nums[k] and increment k.
Print k on a line by itself.
Print the first k values of nums, space-separated, on the next line.

Worked example

nums = [0, 0, 1, 1, 1, 2, 2, 3, 3, 4]
n = 10

i	nums[i]	nums[k-1]	action	k after	nums prefix
1	0	0	skip	1	[0]
2	1	0	write at 1, k=2	2	[0,1]
3	1	1	skip	2	[0,1]
4	1	1	skip	2	[0,1]
5	2	1	write at 2, k=3	3	[0,1,2]
6	2	2	skip	3	[0,1,2]
7	3	2	write at 3, k=4	4	[0,1,2,3]
8	3	3	skip	4	[0,1,2,3]
9	4	3	write at 4, k=5	5	[0,1,2,3,4]

Output: 5 and 0 1 2 3 4.

Complexity

Time: O(n). One pass.
Space: O(1) extra. We mutate nums in place.

Common pitfalls

Comparing nums[i] to nums[i - 1] instead of nums[k - 1]. Works on dense duplicates but breaks when you have multiple stretches of duplicates separated by unique stretches: you would re-write the same value into the prefix and double-count. Always compare to the last accepted value, not the previous read value.
Starting k at 0. Then on i = 0 you compare to nums[-1] (Python) which is the last element. Use k = 1 and start the loop at i = 1.
Reading nums[k - 1] when k == 0. Defensive: short-circuit for empty input.
Printing the whole nums array. The buffer may have stale values past index k. Print only the first k slots.
Returning a new list. The interview/competitive variant of this problem expects in-place mutation. The judge here just compares the printed output, but the skill is the in-place pattern.

Where this shows up in the enterprise

ClickHouse / Apache Pinot / Druid compaction: ReplacingMergeTree and friends collapse duplicate keys in sorted runs.
Stripe Radar deduplication: time-sorted event streams with synthetic replays remove duplicates before scoring.
Google Ads conversion attribution: post-sort deduplication of conversion events by ID.
AWS Athena partition deduplication: each partition's sorted-by-key data has duplicates removed during repair.
GitHub Actions step caching: sorted-by-step-key list, dedupe before resolving hits.
Spotify "Wrapped" generation: per-user sorted listen events, collapse to unique tracks before counting plays.
Bloomberg ticker history: sorted-by-timestamp tick stream where the same trade arrives twice from two feeds; the second is dropped via this pattern.

The shape is: sorted run, collapse adjacent duplicates in place. Any pipeline that emits sorted data eventually reaches for this.

Variants you'll see elsewhere

remove-duplicates-from-sorted-array-ii: allow each value up to twice. Same skeleton, compare to nums[k - 2] instead of nums[k - 1].
remove-element: filter out a specific value; same write-pointer trick.
move-zeroes: the un-sorted cousin, same write-pointer, different predicate.
remove-duplicates-from-sorted-list: linked-list version.
unique-subsequence-count: variant that counts distinct values without mutation.

How parikshan turns this into a learning loop

After you read this article and click through to the practice link, the parikshan flow integrates four layers:

The editor runs your code against public tests for instant feedback and against per-session dynamic private tests (generated freshly per session). Synthetic cases include n = 1 (always answer 1), all-same arrays (always answer 1), all-unique (answer n), and arrays with single duplicates scattered.
Hints are graduated. Level 1 prompts you about adjacency in sorted arrays. Level 4 hands you the full slow/fast recipe.
The AI training assistant offers Explain, Review, Find Bugs, Add Comments, Optimise in exam mode at an integrity cost. In practice mode it is free.
The dispute flow runs after the verdict. Most disputes here are "my answer is right but the output format is wrong", read the spec carefully.

Read this concept, attempt in practice mode, take a hint only when stuck, then redo in exam mode. When the slow/fast skeleton is muscle memory you can write it in under two minutes.

In the AI-integrated workspace

In your future job, the agent writes the code. Your job is to:

Recognise the slow/fast pattern as the right tool any time you see "in place" plus "sorted" plus "filter or collapse". The recognition is the skill.
Tell the agent precisely: "two pointers, write-index trails read-index, compare to the last accepted value". Not "remove duplicates", that gets you a set(nums) which destroys order and uses extra memory.
Read the generated code and verify: the comparison is against nums[k - 1] (not nums[i - 1]), k starts at 1, and the buffer past k is not printed.
Push the edge cases the agent will miss: n == 1, all equal, all distinct, very large duplicates run-length.

Candidates who recognise the slow/fast pattern in five seconds turn agents into leverage. Candidates who can't end up approving code that allocates a fresh list and forgets the in-place requirement. parikshan is built to train the first kind.