remove-duplicates-from-sorted-array
Difficulty: Easy · Topic: Two Pointers · Practice it on parikshan: /practice/remove-duplicates-from-sorted-array/start
The story
ClickHouse stores each table's data as a sorted run on disk. After a routine merge, the engine sometimes finds duplicate primary-key values that the ReplacingMergeTree policy must collapse to a single record. The merge is in place, the buffer is already exactly the size of the sorted run, with duplicates sitting next to their twins because of the sort. The task: compact the duplicates out so each key appears once, leaving the unique values at the front of the buffer and returning the new length. No extra allocations; the buffer cannot grow.
That is remove-duplicates-from-sorted-array. The two-pointer slow/fast trick is the workhorse of every in-place compaction problem, and this is the cleanest place to learn it.
What's actually being asked
Given a sorted (non-decreasing) integer array nums, remove duplicates in place so each value appears at most once. Print:
- on the first line,
k, the count of unique values; - on the second line, the
kunique values, space-separated.
The naive approach
Build a fresh list, appending each value only if it differs from the last appended one. O(n) time, O(n) space. Easy. Wastes memory you did not need to waste.
A more obvious wrong approach: shift the array left every time you find a duplicate. That is O(n²) and times out on dense duplicates.
The key insight
The array is sorted. All copies of a value are adjacent. So you do not need a hash set, a Counter, or a fresh list. You only need to know: "is this current value different from the last value I have already accepted?" A single comparison.
Two pointers, slow/fast flavour:
write(the slow one), points to the slot where the next accepted unique value should land. Its invariant:nums[0..write-1]are the unique values found so far, in order.read(the fast one), scans left to right.
For each read, if nums[read] != nums[write - 1], copy nums[read] into nums[write] and advance write. If they are equal, skip; the value is already represented by nums[write - 1].
The first element is trivially unique (the array is non-empty under the constraints), so initialise write = 1 and start scanning from read = 1.
Step-by-step approach
- If
n == 0, print0then an empty line. (Not possible givenn >= 1constraints, but defensive code is cheap.) - Set
k = 1. The first element of the result is alwaysnums[0]. - For
ifrom1ton - 1:- If
nums[i] != nums[k - 1], copynums[i]intonums[k]and incrementk.
- If
- Print
kon a line by itself. - Print the first
kvalues ofnums, space-separated, on the next line.
Worked example
nums = [0, 0, 1, 1, 1, 2, 2, 3, 3, 4]
n = 10
| i | nums[i] | nums[k-1] | action | k after | nums prefix |
|---|---|---|---|---|---|
| 1 | 0 | 0 | skip | 1 | [0] |
| 2 | 1 | 0 | write at 1, k=2 | 2 | [0,1] |
| 3 | 1 | 1 | skip | 2 | [0,1] |
| 4 | 1 | 1 | skip | 2 | [0,1] |
| 5 | 2 | 1 | write at 2, k=3 | 3 | [0,1,2] |
| 6 | 2 | 2 | skip | 3 | [0,1,2] |
| 7 | 3 | 2 | write at 3, k=4 | 4 | [0,1,2,3] |
| 8 | 3 | 3 | skip | 4 | [0,1,2,3] |
| 9 | 4 | 3 | write at 4, k=5 | 5 | [0,1,2,3,4] |
Output: 5 and 0 1 2 3 4.
Complexity
- Time: O(n). One pass.
- Space: O(1) extra. We mutate
numsin place.
Common pitfalls
- Comparing
nums[i]tonums[i - 1]instead ofnums[k - 1]. Works on dense duplicates but breaks when you have multiple stretches of duplicates separated by unique stretches: you would re-write the same value into the prefix and double-count. Always compare to the last accepted value, not the previous read value. - Starting
kat0. Then oni = 0you compare tonums[-1](Python) which is the last element. Usek = 1and start the loop ati = 1. - Reading
nums[k - 1]whenk == 0. Defensive: short-circuit for empty input. - Printing the whole
numsarray. The buffer may have stale values past indexk. Print only the firstkslots. - Returning a new list. The interview/competitive variant of this problem expects in-place mutation. The judge here just compares the printed output, but the skill is the in-place pattern.
Where this shows up in the enterprise
- ClickHouse / Apache Pinot / Druid compaction:
ReplacingMergeTreeand friends collapse duplicate keys in sorted runs. - Stripe Radar deduplication: time-sorted event streams with synthetic replays remove duplicates before scoring.
- Google Ads conversion attribution: post-sort deduplication of conversion events by ID.
- AWS Athena partition deduplication: each partition's sorted-by-key data has duplicates removed during repair.
- GitHub Actions step caching: sorted-by-step-key list, dedupe before resolving hits.
- Spotify "Wrapped" generation: per-user sorted listen events, collapse to unique tracks before counting plays.
- Bloomberg ticker history: sorted-by-timestamp tick stream where the same trade arrives twice from two feeds; the second is dropped via this pattern.
The shape is: sorted run, collapse adjacent duplicates in place. Any pipeline that emits sorted data eventually reaches for this.
Variants you'll see elsewhere
remove-duplicates-from-sorted-array-ii: allow each value up to twice. Same skeleton, compare tonums[k - 2]instead ofnums[k - 1].remove-element: filter out a specific value; same write-pointer trick.- move-zeroes: the un-sorted cousin, same write-pointer, different predicate.
remove-duplicates-from-sorted-list: linked-list version.unique-subsequence-count: variant that counts distinct values without mutation.
How parikshan turns this into a learning loop
After you read this article and click through to the practice link, the parikshan flow integrates four layers:
- The editor runs your code against public tests for instant feedback and against per-session dynamic private tests (generated freshly per session). Synthetic cases include
n = 1(always answer 1), all-same arrays (always answer 1), all-unique (answer n), and arrays with single duplicates scattered. - Hints are graduated. Level 1 prompts you about adjacency in sorted arrays. Level 4 hands you the full slow/fast recipe.
- The AI training assistant offers Explain, Review, Find Bugs, Add Comments, Optimise in exam mode at an integrity cost. In practice mode it is free.
- The dispute flow runs after the verdict. Most disputes here are "my answer is right but the output format is wrong", read the spec carefully.
Read this concept, attempt in practice mode, take a hint only when stuck, then redo in exam mode. When the slow/fast skeleton is muscle memory you can write it in under two minutes.
In the AI-integrated workspace
In your future job, the agent writes the code. Your job is to:
- Recognise the slow/fast pattern as the right tool any time you see "in place" plus "sorted" plus "filter or collapse". The recognition is the skill.
- Tell the agent precisely: "two pointers, write-index trails read-index, compare to the last accepted value". Not "remove duplicates", that gets you a
set(nums)which destroys order and uses extra memory. - Read the generated code and verify: the comparison is against
nums[k - 1](notnums[i - 1]),kstarts at1, and the buffer pastkis not printed. - Push the edge cases the agent will miss:
n == 1, all equal, all distinct, very large duplicates run-length.
Candidates who recognise the slow/fast pattern in five seconds turn agents into leverage. Candidates who can't end up approving code that allocates a fresh list and forgets the in-place requirement. parikshan is built to train the first kind.