majority-element
Difficulty: Easy · Topic: Arrays & Hashing · Practice it on parikshan: /practice/majority-element/start
The story
A reliability engineer at Cloudflare runs the leader election for a fleet of edge configuration servers. Each second, the proxy nodes poll a quorum of replicas and report which leader they observe. The orchestrator needs the consensus leader, the one strictly more than half of replicas agree on, and it must compute that consensus in a single pass over a stream of replica observations. Memory is tight (the orchestrator process runs on every edge POP), and the observations arrive as a stream you cannot rewind. A hash map of counts would work, but at 1 million POPs with hundreds of replicas, the constant-factor memory really does matter.
That is majority-element. The problem promises that one value appears more than half the time. You are asked to find it; the elegant solution finds it in constant memory.
What's actually being asked
Given an integer array nums of length n, return the majority element: the value that appears strictly more than n / 2 times. The problem guarantees a majority element exists. No need to verify, no need to handle the "no majority" case (though see pitfalls).
The naive approach
Count with a hash map: one pass to count, one pass to find the max. O(n) time, O(d) space where d is the number of distinct values. Correct, fine for n = 10^5. The only reason to do better is the memory argument.
Sort and pick the middle: after sorting, the majority element must occupy index n / 2 because it fills more than half the array, so it spans the middle. O(n log n) time, O(1) extra space (in-place sort). Clever, slower than counting.
The key insight
Think of the array as a sequence of votes. Pair every "minority" vote with one "majority" vote and cancel them both. Because the majority strictly exceeds half, it cannot be fully cancelled. At least one majority vote survives whatever pairing you choose. That is Boyer-Moore voting.
Algorithmically: maintain a single candidate and a counter.
- When the counter is 0, adopt the current value as the new candidate.
- If the current value matches the candidate, increment.
- Otherwise, decrement.
After the pass, the surviving candidate is the majority element. One pass, two scalar variables, no auxiliary data structure.
The proof: every time cnt returns to 0, the prefix consumed so far has a perfect tie (some value cancelled all majority votes seen so far, which means the prefix has equal minority and majority). The remaining suffix still has majority strictly greater than half its own length (because at most half the original majority's lead was used up; the strict inequality preserves). The recursion bottoms out at a suffix where the majority is unchallenged, locking the candidate.
Step-by-step approach
- Initialise
cand = None,cnt = 0. - For each
xinnums:- If
cnt == 0, setcand = x,cnt = 1. - Else if
x == cand,cnt += 1. - Else,
cnt -= 1.
- If
- Return
cand.
def majority_element(nums):
cand = None
cnt = 0
for x in nums:
if cnt == 0:
cand = x
cnt = 1
elif x == cand:
cnt += 1
else:
cnt -= 1
return cand
Worked example
nums = [2, 2, 1, 1, 1, 2, 2]
| i | x | cnt before | branch | cand after | cnt after |
|---|---|---|---|---|---|
| 0 | 2 | 0 | adopt | 2 | 1 |
| 1 | 2 | 1 | match, inc | 2 | 2 |
| 2 | 1 | 2 | mismatch, dec | 2 | 1 |
| 3 | 1 | 1 | mismatch, dec | 2 | 0 |
| 4 | 1 | 0 | adopt | 1 | 1 |
| 5 | 2 | 1 | mismatch, dec | 1 | 0 |
| 6 | 2 | 0 | adopt | 2 | 1 |
Final cand = 2. Verify: 2 appears 4 times out of 7, which is more than 7 / 2 = 3.5. Correct.
Complexity
- Time: O(n), one pass.
- Space: O(1), two scalar variables.
Hash-map count was O(n) time, O(d) space. Sort-and-middle was O(n log n) time, O(1) space. Boyer-Moore is the only one that hits both O(n) time and O(1) space at once.
Common pitfalls
- Forgetting the problem guarantees a majority exists: if the guarantee is removed (some variants don't promise one), you must run a second pass to verify the candidate's count. Boyer-Moore returns something; only the guarantee makes it the right something.
- Adopting too eagerly: setting
cand = xon every iteration regardless ofcntresets the algorithm at every step and always returns the last element. - Confusing "majority" with "most frequent": "more than half" is stricter than "most frequent". Boyer-Moore solves the first, not the second. If asked for the most frequent in general, use counting.
- Negative counters:
cntcannot drop below 0 because of theif cnt == 0branch ahead. Confirm your control flow respects the precedence. - Wrong tie-break: with two values both appearing exactly
n / 2times, neither is a majority. Boyer-Moore returns one of them; the spec says this case will not occur. If it does in your variant, your algorithm is silently wrong. - Trying to extend to "more than n/3" without thinking: a "more than
n/3" version exists (Boyer-Moore generalised to at most two candidates), but it requires two counters and post-verification. Don't shoehorn the single-counter version onto it.
Where this shows up in the enterprise
- Cloudflare edge leader election: the story above.
- Raft consensus / etcd leader confirmation: a node checks whether a majority of peers acknowledge the current leader; the streaming version of this algorithm is the gossip layer.
- Kubernetes admission controllers: when many policy plugins vote on a request, the consensus decision uses a streaming majority count.
- YouTube / Twitch chat sentiment: cheap pre-filter to find the dominant reaction emote in a streaming window before invoking a heavier model.
- Stripe Radar fraud signals: among many weak signals voting "fraud" or "ok", the dominant vote is the input to the policy engine.
- Akamai geo-routing: each request observes which DNS resolver it bounced through; the dominant resolver per region is the consensus path.
- GitHub Actions self-hosted runners: when many shards report a build verdict, the orchestrator picks the majority verdict.
- Bloomberg consolidated tape: the consensus quote across exchanges for the same symbol uses a streaming dominant-value detector for outlier rejection.
Variants you'll see elsewhere
majority-element-ii: find all elements appearing more thann/3times; at most two such elements. Boyer-Moore with two candidates plus a verification pass.find-the-celebrity: graph variant; the celebrity is the "majority" node in a specific sense (known by all, knows none).online-majority-element-in-subarray: range queries, requires segment-tree-with-candidate structure.find-the-duplicate-number: different shape but similar "use array invariants" flavour; often solved with Floyd's cycle detection.single-number: every element appears twice except one; XOR is the constant-memory trick.single-number-ii: every element appears thrice except one; bit-by-bit modular counting.
The single-pass constant-memory family is small but powerful. Recognising membership in it is a senior-engineer pattern.
How parikshan turns this into a learning loop
Read this article, then click into practice. The editor runs your code against public tests plus per-session dynamic private tests that are regenerated each attempt, so memorising a gist does not pass. Hints are graduated: free first hint, small penalty for the second, larger for the algorithmic skeleton. The AI training assistant in exam mode offers Explain, Review, Find Bugs, Add Comments, Optimise, each with a small integrity cost so you spend AI deliberately. In practice mode AI chat is free, the right place to talk through the cancellation argument. After the verdict, the dispute flow sends contested results to a human reviewer in one click. Attempt in practice mode first, then redo in exam mode untouched. When you can write the seven-line solution in two minutes and explain why the candidate survives, the pattern is in your fingers.
In the AI-integrated workspace
An AI agent will reach for Counter(nums).most_common(1)[0][0] because it is one line. That works at this scale, but for streaming inputs (Cloudflare consensus, Kafka topics, real telemetry) the agent has shipped a memory-unbounded solution. Your job is to:
- Recognise when the problem is stream-shaped. "More than half the time", "majority over N samples", "consensus" are all signals to demand Boyer-Moore.
- Verify the agent included the candidate-survives proof in its mental model. Generated code often returns a random element when no majority is guaranteed; agents skip the "verify after" pass.
- Check the control flow: the algorithm must adopt when
cnt == 0before comparing, otherwise the very first element is never adopted. - For variants (
>n/3, etc.), the agent will copy the single-candidate logic and silently produce wrong answers. Push back on the algorithm change, not just the constant.
A senior engineer reads "streaming consensus, more than half, O(1) memory" and immediately maps to Boyer-Moore. The candidate who can do that directs AI agents productively and ships memory-safe code on every stream. The candidate who cannot deploys a hash-map count that OOMs the orchestrator at scale. parikshan trains the first kind.