Arrays & Hashing
The core idea
If you can recognise that you've already seen a value, you can answer questions about repetition, complementarity, and frequency in O(n) time, often replacing O(n²) brute-force scans. A hash set or hash map is the cheapest memory you can buy: roughly one extra integer of space per element, in exchange for O(1) lookup. That trade is the most useful trick in algorithmic programming.
Two sub-patterns dominate:
- Membership / dedup: a hash set tells you whether you've seen a value.
- Counting: a hash map from value to count tells you how often each one appears.
A close cousin, the prefix sum, lets you answer "what was the running total at index i" in O(1) after a single pass. Combine prefix sums with a hash map and you unlock subarray-sum questions.
When to reach for it
Trigger phrases in the problem statement:
- "find a duplicate" / "is unique"
- "count the frequency of"
- "two numbers that sum to" / "k pairs that satisfy"
- "longest substring with ..."
- "subarray that sums to k"
- "group these into ..."
If your first instinct is a nested loop, ask: what is the inner loop looking up? If the answer is "anything previously seen", hashing replaces it.
How to approach
- Read the problem and underline what is being looked up repeatedly.
- Decide: do I need to know whether it appeared (set) or how often (map)?
- Walk the array once. At each step: first ask the structure if it has answered the question; then update the structure with the current element.
- Confirm the order is correct. Most bugs in this pattern come from updating the structure before checking, which lets a value match itself.
Real-world applications
- Deduplication pipelines: ingest logs, drop duplicates by request ID.
- Recommendation systems: count item co-occurrences across users.
- Fraud detection: flag the second appearance of a one-time token.
- Symbol tables in compilers: variable name to declaration.
- Caching: any cache is a hash map, conceptually.
In the AI-integrated workspace
An AI agent will reach for for x in arr: for y in arr: more often than it should, because the brute force fits on one screen. When you read an agent's code on this kind of problem, the first thing to check is whether a hash set or hash map would have collapsed two loops into one. Saying "rewrite this with a hash map keyed on the complement" is a one-line review comment that turns O(n²) into O(n) on its way to prod.
The other failure mode: an agent producing a hash-based solution that mutates the structure mid-iteration. Always check the order of "lookup vs. insert", that ordering is invisible to most static analysers and is the bug that makes "works on the sample, fails on the duplicates" reviews so common.
Problems in this cluster
Easy: contains-duplicate, valid-anagram, two-sum, majority-element, first-unique-character, happy-number, missing-number, find-pivot-index. Medium: group-anagrams, top-k-frequent-elements, product-of-array-except-self, encode-decode-strings, subarray-sum-equals-k, longest-consecutive-sequence.
Start with contains-duplicate; it is the cleanest possible version of the pattern.