binary-search
Difficulty: Easy · Topic: Binary Search · Practice it on parikshan: /practice/binary-search/start
The story
A betting platform stores 10 million accepted bets sorted by timestamp. When a regulator asks "what was the first bet placed after 03:14:00 today?", the database cannot scan all 10 million; it has to land on the answer in under 25 comparisons. The trick has been the same since the 1940s: keep halving the search space.
That is binary search. The textbook framing ("find x in a sorted array") undersells it; the real value is that any monotone yes/no question can be answered in log₂(n) steps. Once you see this, you find binary searches in places you didn't expect: tuning a parameter, finding the breakpoint, deciding capacity.
What's actually being asked
Given a sorted array of integers nums and an integer target, return the index of target if it is present; otherwise return -1.
The naive approach
Walk the array. O(n). For n=10 million, that is 10 million comparisons. Binary search reduces it to 24.
The key insight
The array is sorted. So if you look at the middle element and it is less than the target, the target cannot be in the left half. Eliminate the left half. Repeat. Each step halves the remaining range. After ⌈log₂ n⌉ steps the range has length 1.
The trick is that the mental model is wrong until you see binary search as finding a boundary under a monotone predicate, not "find a number in a sorted array". The predicate "is this element ≥ target?" is True at the answer and False to its left. The False/True boundary is what we want.
Step-by-step approach
The template that does not lie:
def search(nums, target):
lo, hi = 0, len(nums) # hi is EXCLUSIVE; "answer is in [lo, hi)"
while lo < hi:
mid = (lo + hi) // 2 # left-biased midpoint
if nums[mid] >= target: # the monotone predicate
hi = mid # mid might be the answer; keep it
else:
lo = mid + 1 # mid is definitely too small
# lo == hi; "first index with nums[idx] >= target"
if lo < len(nums) and nums[lo] == target:
return lo
return -1
Three commitments that make this template invincible:
hiis exclusive. The invariant is "answer is in[lo, hi)".- The predicate is "True if
nums[mid] >= target". It is True at the answer and at every index to its right. - When the loop exits,
lo == hiis the first index where the predicate is True. Check whether that index actually holds the target.
Worked example
nums = [-1, 0, 3, 5, 9, 12]
target = 9
- lo=0, hi=6 (length 6)
- mid=3, nums[3]=5, 5<9 → lo=4
- lo=4, hi=6
- mid=5, nums[5]=12, 12≥9 → hi=5
- lo=4, hi=5
- mid=4, nums[4]=9, 9≥9 → hi=4
- lo=4, hi=4 → exit
- nums[4]=9 == target → return 4. ✓
Counter-case: target=7. Same execution as above gets to lo=hi=4, nums[4]=9 ≠ 7 → return -1.
Complexity
- Time: O(log n).
- Space: O(1).
Common pitfalls
The four classics:
- Inclusive
hiconfusion: if you sethi = n - 1and usewhile lo <= hi, you must also writelo = mid + 1andhi = mid - 1(both adjust off the midpoint). The half-open variant in this article is harder to write off-by-one bugs into. mid = (lo + hi) // 2overflow in fixed-width integer languages: for very largelo + hi, the sum can wrap. Safe form:mid = lo + (hi - lo) // 2. Python is fine; C, Java, Rust, Go need care.lo = midcauses an infinite loop: whenlo == hi - 1,mid == lo, and you setlo = mid, you never advance. Always eitherlo = mid + 1orhi = mid(one of the two strictly shrinks the range).- Forgetting the final equality check: the loop finds the boundary, not the match. After the loop you must check
nums[lo] == targetfor the exact-match version.
Where this shows up in the enterprise
- Database B-tree lookups: every primary-key index in PostgreSQL, MySQL, MongoDB, DynamoDB is a tree of binary searches.
git bisect: binary search over commit history to find the first broken commit.- Capacity planners: smallest server count that holds the load is binary search on a feasibility predicate.
- Datadog / Grafana percentile queries: t-digest and HDR-histogram lookups are binary searches over centroids.
- Kafka offset by timestamp: returning the first offset at or after a given time uses binary search over segment indexes.
- Auto-tuning ML training batch size: largest batch that fits in GPU memory; binary-search the answer space.
- Stripe rate-limit token allotment: largest quota that does not violate fairness; binary search.
- CDN cache warm-up: smallest TTL that keeps hit rate above a target; binary search.
- Pricing experiments: lowest price that keeps conversion above threshold.
When you see "smallest X such that ..." or "largest X such that ..." in a spec, the bell should ring.
Variants you'll see elsewhere
search-insert-position: same template; returnloeven on miss.first-bad-version: binary search on a predicate function, not an array.sqrt-x: binary search the answer space (the integers from 0 to x).find-min-in-rotated-sorted-array: binary search when the array is rotated; the predicate becomes "is this index past the rotation?".search-in-rotated-sorted-array: harder predicate, same template.find-peak-element: binary search on "is this slope going up?".
How parikshan turns this into a learning loop
Binary search is the topic where the AI training mode is most useful and most dangerous. Useful because if you ask the assistant for an explanation, it can describe the invariant cleanly. Dangerous because if you ask for code, the agent will routinely produce a working but inconsistent template (sometimes inclusive, sometimes exclusive, sometimes off by one).
The parikshan loop recommendation: solve binary-search, search-insert-position, first-bad-version, and sqrt-x in practice mode with no AI. Only after all four are AC do you turn on AI training and ask it to Optimise or Review a solution. The point is to lock the template into your fingers; only then can you tell when the agent's template is correct.
The proctor and integrity engine track which features you used. A submission that uses no AI carries no AI tag; a submission that used AI is annotated, so a future review can weigh that signal honestly.
In the AI-integrated workspace
In production, binary search is the algorithm an agent will write most often, because so much enterprise code is "find the first row that ..." or "find the largest X that fits". The agent will get most of them right and a few catastrophically wrong, and the wrong ones will look like the right ones.
Your verification ritual on every agent-generated binary search:
- Ask: "what is the loop invariant?" The agent should state it in one sentence.
- Ask: "is
hiinclusive or exclusive?" The agent should know. - Trace the smallest non-trivial input (length 0, length 1, length 2). Generated code that handles length 3+ correctly often dies on length 0.
- Trace an input where the target is not present. The
-1(or sentinel) return must be reached.
Engineers who can do that ritual in under a minute are the ones who ship correct binary-search code in seven languages despite each having different integer semantics. parikshan trains exactly that ritual.