Case: Stripe's Idempotency-Key Design
Era: 2014 to present · Author / source: Stripe Engineering blog, "Designing robust and predictable APIs with idempotency" (2017) and Stripe API Reference, "Idempotent requests" · Read alongside: retries, exactly-once semantics, distributed systems failure modes
The situation
Stripe processes payments. A payment API has one property no other API has: the cost of a duplicate is real money moving real distance, and the cost of a lost request is a customer who paid and got no service. Both failures are unacceptable, and the network between the merchant's server and Stripe's API guarantees neither.
The classic problem: a merchant calls POST /v1/charges. The connection drops before the response comes back. The merchant does not know whether the charge succeeded. Retrying might charge the customer twice. Not retrying might lose the sale. At Stripe's volume, this is not an edge case; it is a recurring daily reality across millions of mutating requests.
Stripe needed a primitive that let clients retry mutating requests safely, without coordinating with their database, without distributed locks, and without changing the rest of the API surface. It had to be cheap enough that every merchant could use it, and strong enough that the platform could commit to "exactly once" semantics on retries.
The options on the table
A team building such a system in 2014 had a few credible alternatives:
- Server-generated request IDs returned in 5xx responses. The server tells the client what to retry. Problem: the response is exactly what got lost on the network.
- At-least-once with merchant-side deduplication. Push the problem onto every integrator. Reality: most merchants would get it wrong, and Stripe would absorb the support burden anyway.
- Distributed two-phase commit between merchant and Stripe. Strong correctness, terrible ergonomics, useless for the long tail of merchants writing PHP on shared hosting.
- Client-generated idempotency key in an HTTP header. Merchant generates a UUID once per logical operation, passes it on every retry. Server keys the operation by the merchant's UUID and replays the cached response on duplicates.
- Make POST endpoints idempotent on natural keys (e.g., charge ID). Forces every endpoint to carry a unique business identifier. Hard to retrofit, and the natural keys often do not exist at request time.
What they chose, and why
Option 4. A client-generated idempotency key passed via the Idempotency-Key HTTP header on every mutating request. Stripe's API reference states the rules plainly: "All POST requests accept idempotency keys" and "Don't send idempotency keys in GET and DELETE requests because it has no effect. These requests are idempotent by definition."
The mechanics, as Stripe documents them:
- The server saves "the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeds or fails."
- On a retry with the same key, "the server simply replies with a cached result of the successful operation."
- The server fingerprints the request: "The idempotency layer compares incoming parameters to those of the original request and errors if they're not the same to prevent accidental misuse."
- Keys are bounded: "Idempotency keys are up to 255 characters long," and Stripe "suggest using V4 UUIDs."
- Retention is documented: "You can remove keys from the system automatically after they're at least 24 hours old," after which "We generate a new request if a key is reused after the original is pruned."
- Validation precedes commitment: "We save results only after the execution of an endpoint begins. If incoming parameters fail validation, or the request conflicts with another request that's executing concurrently, we don't save the idempotent result."
The design reasoning, as articulated in Stripe's 2017 engineering post, is that idempotency keys "guarantee the safety of distributed operations" without forcing the merchant to operate a transaction log of their own.
What they gave up
- A server-side correctness contract that survives forever. The 24-hour retention is a compromise. Beyond that window, the same key is a fresh request. A client retrying after a long outage can still cause a duplicate. Stripe accepted this rather than pay forever-storage costs on every key.
- Cross-request transactions. An idempotency key covers exactly one request. If a merchant needs to atomically create a customer and charge them, that is two keys and two retry windows. Stripe deliberately did not build a transactional API.
- Forgiveness on parameter drift. Because parameters are fingerprinted, a merchant who retries with a slightly different request gets an error instead of "did the right thing anyway." This trades flexibility for safety against accidental misuse.
- Magic for GETs and DELETEs. Stripe explicitly says these are "idempotent by definition" and refuses to honor the header. A merchant who relies on idempotency keys uniformly will be surprised.
How it played out
Idempotency keys became the de facto pattern for payment-grade REST APIs. Square, Adyen, PayPal, and many neobanks adopted variations. The header name became near-canonical in the industry. Stripe's official SDKs auto-generate keys on retries with exponential backoff, which means most merchants get the safety property whether they understand it or not.
The 24-hour window has held up well: it is long enough to cover real-world retry storms (network partitions, datacenter failovers, deploy-induced timeouts), and short enough that storage stays bounded. The decision to fingerprint and reject mismatched parameters has caught a steady trickle of integration bugs that would otherwise have become duplicate charges or refund disputes.
The pattern's most quoted line, from Stripe's engineering writing, is that idempotency lets distributed systems be "robust and predictable" without requiring the client to be a distributed system. That framing, more than the mechanism itself, shaped how the next generation of payment APIs were designed.
Where it ties to this bank's patterns
- [[exactly-once-semantics]] is exactly what this primitive purchases, within a bounded window.
- [[retries-backoff-jitter]] is the client-side companion: SDK retries plus an idempotency key are safe to compose; either one alone is not.
- [[caching-deduplication]] applies on the server side; the idempotency store is conceptually a write-through cache keyed by client-generated UUIDs.
- Problem links: any system-design problem involving order placement, payment, ticket booking, or any mutating API at scale.
What a candidate should take away
- Push correctness primitives to the boundary where they are cheapest. The client knows when it is retrying; the server does not. Give the client the tool, not the server the burden.
- Bound your guarantees. A 24-hour retention window is a deliberate choice. Promising "forever" is a lie that becomes a cost line item.
- Fingerprint, do not just dedupe. Same key with different parameters is almost certainly a bug; treating it as success is dangerous.
- Make the safe path the easy path. Stripe's SDKs use idempotency keys automatically. The merchant does not have to know to be protected.
- Distinguish operations that are inherently idempotent from those that need an explicit token. GET and DELETE do not need a key; POST does. Conflating them confuses callers.
What an AI agent would not have got right
- An AI asked to "design a payments API" will almost certainly produce something with at-least-once semantics and no story about retries. It treats payments like REST CRUD.
- It will reach for distributed transactions (Saga, two-phase commit) before considering the simpler idempotency-key pattern, because long-form training data overweights elaborate solutions.
- It will not fingerprint parameters. The first version will happily replay a cached response even if the merchant accidentally changed the amount, because "the cache key matched."
- It will not bound retention, which silently turns the idempotency store into the largest table in the database within a year.
- It will not push the key generation to the client. The first instinct is "the server should generate a unique ID," which is exactly the design that fails when the network drops the response.
Sources
- Stripe API Reference, "Idempotent requests": https://docs.stripe.com/api/idempotent_requests
- Stripe Engineering blog, "Designing robust and predictable APIs with idempotency": https://stripe.com/blog/idempotency