Bit Manipulation
The core idea
Bits are flags. With a 32-bit or 64-bit integer you can store 32 or 64 boolean facts in one register and operate on all of them in a single CPU instruction. That is the entire reason the topic exists. Five operators cover almost everything:
&AND: both bits set.|OR: at least one bit set.^XOR: bits differ. Self-inverse:a ^ b ^ b == a.~NOT: flip all bits.<</>>shift: multiply or divide by 2.
Two key tricks:
- XOR cancellation:
x ^ x == 0. Apply XOR to a list where every element appears twice except one, and you get the lonely one. This finds the single number in linear time and constant space. - Brian Kernighan's bit-clear:
n & (n - 1)removes the lowest set bit. Repeat until zero; the count of iterations is the number of set bits.
When to reach for it
Trigger phrases:
- "every element appears twice except one"
- "count the set bits / Hamming weight"
- "constant space"
- "without using extra memory"
- problems where a hash set would normally be the answer, but extra space is forbidden
Bit tricks are also the right reach for sets of small fixed size (a 32-element universe fits in one int, 64 in one long).
How to approach
- Write what each bit represents in one sentence.
- Decide which operator combines two bit-sets the right way (AND, OR, XOR).
- Test on tiny inputs (length 1, 2, 3). Bit-trick code that survives length 3 usually survives in general.
- If the language is anything other than Python (which auto-promotes), watch for overflow.
Enterprise scenarios
- Permissions systems (AWS IAM, Linux file perms, Discord role mask, GitHub team perms): each permission is a bit; a user's permission set is one integer.
- Feature flags at scale: Booking.com famously runs thousands of feature flags as a packed bitmask per request.
- Bloom filters: massively-deployed approximate set-membership data structure, built from bit arrays. Cloudflare, Akamai, Google, every CDN uses them at the edge.
- Database query optimisation: bitmap indexes (e.g., PostgreSQL bitmap heap scans) combine multiple WHERE conditions via bit AND.
- Compression: every codec is a bit twiddler (LZ77, gzip, Snappy, brotli).
- Network protocols: TCP flags, IP header fields, every wire format you have read.
- Game state in puzzle solvers (Rubik's, sudoku): pack state into a few words to fit billions of states in RAM.
In the AI-integrated workspace
Bit-twiddling code is the place where AI agents produce the most elegant or the most broken code, with no middle ground. Verifying it requires reading every operator.
Pitfalls to check:
- Sign extension:
>>arithmetic vs logical shift differs across languages. In Java>>>is logical; in C and Python signed-vs-unsigned behaves differently. - Bit-width mismatches: a 32-bit input shifted by 33 is undefined behaviour in C; in Python it returns the right answer because integers are arbitrary-precision. Generated code that "works in Python" can ship in another language and fail.
- XOR proof: agents will produce
xor everythingsolutions and you must verify the proof: every element except one appears an even number of times.
This is also the cluster where AI agents most often misclassify a problem. They will reach for hashing on a problem that has a known O(1)-space XOR solution. When you read agent code for "find the single number", check whether a hash set was used; if so, ask for the XOR version, and learn the trick.
Problems in this cluster
Easy: number-of-1-bits, single-number.
Both are small. Start with either; one will lead naturally to the other.