parikshan

first-unique-character

Difficulty: Easy  ·  Topic: Arrays & Hashing  ·  Practice it on parikshan: /practice/first-unique-character/start

The story

A platform engineer at Twilio is generating short, human-readable error codes for thousands of telemetry events. Every error gets a four-character mnemonic. The team wants the first character of the mnemonic that does not collide with any other character in the same string, because that "anchor" character is what support engineers grep for when they triage tickets. Find a character that appears exactly once in the mnemonic, and pick the earliest one. If every character collides, the mnemonic gets flagged for regeneration. The check runs on every code at lookup time, so it must be linear.

That is first-unique-character. It is the smallest, sharpest version of the question "what is the first thing in this sequence that does not repeat?". Once you can write it, the same logic powers spam scoring, duplicate-username detection, and dozens of streaming-uniqueness applications.

What's actually being asked

Given a string s of lowercase English letters (possibly empty), return the index (0-based) of the first character that does not repeat anywhere else in s. If every character repeats, return -1.

The naive approach

For each index i, scan the rest of the string to ask "does s[i] appear elsewhere?". O(n^2). At n = 10^5, that is 10^10 operations and dead on arrival.

A second naive variant: for each index i, count occurrences of s[i] in s via s.count(s[i]). Same O(n^2), hidden inside the standard library. Looks innocent, performs identically badly.

The key insight

One pass cannot answer the question, because at any character you do not yet know whether it will repeat later. The information lives in the future. So you take two passes.

  • Pass 1: count each character's frequency.
  • Pass 2: walk the string in order; the first index whose character has count 1 is the answer.

That is it. Linear in time, constant in extra space (since the alphabet has 26 letters).

A single-pass alternative exists if you track first-seen index plus a uniqueness flag per character, but it has the same asymptotic cost and more bookkeeping. Two passes is the right answer.

Step-by-step approach

  1. Build counts = Counter(s) (or a 26-length array indexed by ord(ch) - ord('a')).
  2. For i from 0 to len(s) - 1:
    • If counts[s[i]] == 1, return i.
  3. Return -1.
from collections import Counter

def first_uniq_char(s):
    counts = Counter(s)
    for i, ch in enumerate(s):
        if counts[ch] == 1:
            return i
    return -1

Worked example

s = "loveleetcode"

Counts:

chcount
l2
o2
v1
e4
t1
c1
d1

Walk:

ichcountunique?
0l2no
1o2no
2v1yes → return 2

The first character with count 1 is v at index 2. The walk stops immediately on the first hit.

For s = "aabb", every character has count 2, the walk completes with no hit, return -1.

Complexity

  • Time: O(n), two linear passes.
  • Space: O(sigma), the alphabet. With lowercase English letters, that is O(26) = O(1).

Brute force was O(n^2) time, O(1) space. The two-pass solution is asymptotically optimal: every character must be inspected at least once to learn its count.

Common pitfalls

  • Trying to do it in one pass with just a "seen" set: you cannot tell on the spot whether the current character will repeat later. Single-pass solutions need to record first-seen-index plus a uniqueness flag per character, which is more code than two passes.
  • Returning the character instead of the index: read the spec; the answer is an integer index, not the character.
  • Forgetting the -1 return: when every character repeats, you must explicitly return -1. Falling off the loop with no return is silently wrong in Python (returns None).
  • Using a list with s.index(ch) calls: every index call is O(n); the total becomes O(n^2). Avoid it.
  • Off-by-one in the count comparison: counts[ch] == 0 or counts[ch] <= 1 both bug; only == 1 is correct.
  • Unicode: the problem restricts to lowercase letters. In production, real text will include accents, emoji, combining sequences. Use a hash map, not a fixed-length array.

Where this shows up in the enterprise

  • Twilio mnemonic generation: the story above.
  • AWS / Google Cloud resource-tag uniqueness: when generating short resource names, identify the first non-colliding token in a candidate set.
  • Discord username collisions: when suggesting alternates, pick the first character of the username that does not collide with other recent signups.
  • Slack emoji shortname conflicts: in a workspace with many custom emojis, the first non-colliding letter is the anchor for sort-friendly autocomplete.
  • Spotify track-id collision detection: in a hash collision recovery routine, find the first character in the hash that differs across all collisions; same problem with characters as votes.
  • Datadog log-template fingerprinting: tokens that appear in every variant of a log template are filler; the first "unique" token is the anchor for clustering.
  • Adobe Acrobat OCR confidence boost: in noisy OCR output, characters that appear only once per word are the most informative for downstream language models.
  • Stripe Radar feature engineering: among token-level features of a transaction memo, the first feature that does not appear in any other recent memo is a strong novelty signal.

Variants you'll see elsewhere

  • first-unique-number-in-a-stream: same problem on a streaming sequence; maintain an ordered map of unique candidates and a count map.
  • find-the-difference: detect the one extra character inserted into a permutation; count-based.
  • find-all-duplicates: the inverse, list every character with count >= 2.
  • subdomain-visit-count: counting strings across hierarchical keys; same Counter foundation.
  • first-letter-to-appear-twice: the dual; first character whose count crosses 2, found in a single pass.
  • repeated-dna-sequences: hash-based duplicate detection over fixed-length substrings.

The two-pass count-then-scan idiom is one of the smallest, most reusable patterns in arrays-and-hashing.

How parikshan turns this into a learning loop

Read this article, click into practice. The editor runs your code against public tests and per-session dynamic private tests that are regenerated each attempt, so memorising a gist fails. Hints are graduated: free first hint, small penalty for the second, larger for the third. The AI training assistant in exam mode offers Explain, Review, Find Bugs, Add Comments, Optimise, each with a small integrity cost so you spend AI deliberately. In practice mode AI chat is free, the right place to debate single-pass vs two-pass. After the verdict, the dispute flow sends contested results to a human reviewer in one click. Attempt in practice, then redo in exam mode without AI. When you can write the four-line solution in two minutes from a blank screen, the pattern is internalised.

In the AI-integrated workspace

An AI agent will write the two-pass solution correctly nine times in ten. The failure modes that survive code review are:

  1. Returning the character instead of the index: ten lines later in a downstream call, the integer is needed; the bug surfaces in production logs as TypeError: list indices must be integers or slices, not str.
  2. Quietly using s.count(ch) inside the loop: looks one-line and elegant, but is O(n^2). On the sample input of length 12, it passes; at n = 10^5, the deploy times out.
  3. Skipping the empty-string case: the spec allows s = ""; the agent returns -1 correctly by coincidence (loop has nothing to iterate), but if it tried to access counts[s[0]] first, it crashes.
  4. Single-pass attempts: agents sometimes invent a one-pass "track first index and mark as seen" version. It works, but the code is twice as long and twice as easy to get wrong; ask for the two-pass version.

A senior engineer reads "first non-repeating character" and immediately maps it to Counter-then-scan. The candidate who can do that productively writes correct code in one pass and reviews generated code in another. The candidate who cannot misses the s.count() trap and pushes a quadratic algorithm through to production. parikshan trains the first kind.