Case: Uber Built H3 to Index the World as Hexagons
Era: 2016 to 2018 (open source release) · Author / source: Uber Engineering blog, "H3: Uber's Hexagonal Hierarchical Spatial Index" (2018); h3geo.org documentation · Read alongside: geospatial indexing, marketplace dynamics, spatial joins
The situation
Uber is a marketplace. Riders are on one side, drivers on the other, and the marketplace clears continuously in physical space. Surge pricing, ETA estimation, supply-demand matching, batched trip dispatch, all of it depends on the ability to answer questions like: "How many drivers are currently within 2 km of this rider?" or "What is the average wait time in this neighborhood right now?"
The standard ways to slice geographic space are:
- Lat/lon rectangles (geohash, S2). Cheap to compute, easy to index in a key-value store, but the cells have wildly different shapes near the poles and along longitude lines.
- Postal codes, administrative zones. Human-meaningful, but politically defined, irregular in shape, and useless for marketplace math.
- Triangular grids. Three neighbor distances per cell, awkward for proximity averaging.
Uber's marketplace teams needed a grid that was uniform in shape, hierarchical (so you could roll up small cells into bigger ones), globally indexed (so the same code worked in Lagos and London), and computationally cheap. Square grids failed the uniformity test, especially across latitudes. Geohash and S2 came close but had quirks: geohash cells change shape with latitude, S2 cells are square-ish on a cubed sphere with curvature artifacts.
The options on the table
- Geohash. Already widely adopted, simple Base32 string encoding. Rectangular cells, distortion away from the equator.
- Google's S2. Hierarchical, used by Foursquare, MongoDB, Google Maps internally. Square cells projected onto a cube. Reasonable but not uniform in neighbor distance.
- Hexagonal grid, custom built. Hexagons are the unique convex regular polygon that tiles the plane with one neighbor distance (squares have two: edge-adjacent and corner-adjacent; triangles have three). Uniformity is intrinsic.
- Voronoi tessellation around points of interest. Locally optimal but expensive and non-hierarchical.
- No global grid; use lat/lon directly with radius queries. Brutally expensive at marketplace scale.
What they chose, and why
H3, a hexagonal hierarchical spatial index built from scratch. Open-sourced under Apache 2.0 in 2018.
The reasoning Uber published:
- One neighbor distance. Hexagons have "only one distance between a hexagon centerpoint and its neighbors" compared to two distances (edge vs. corner) for squares and three for triangles. This "greatly simplifies performing analysis and smoothing over gradients." For marketplace math that averages over neighbors, uniform neighbor distance is not a luxury; it removes a class of correction factors.
- Quantization error. Hexagons "minimize the quantization error introduced when users move through a city." A driver moving in a straight line through a city changes hexagonal cell membership in a smoother pattern than they would change geohash cells.
- Hierarchy. H3 supports 16 resolutions, "with each finer resolution has cells with one seventh the area of the coarser resolution." This lets the same data feed surge analysis at neighborhood scale and city-wide analysis at metro scale.
- Radius approximation. Hexagons "allow us to approximate radiuses easily." Asking "drivers within 2 km" reduces to a known ring of cells around the rider's cell.
H3 is honest about its principal weakness: hexagons cannot tile a sphere perfectly. H3 places 12 pentagons on the globe to close the topology, intentionally located in the Pacific Ocean and other low-population areas to minimize how often any real marketplace query touches them. The documentation calls this "exact logical containment but only approximate geometric containment across the cell hierarchy"; child cells do not nest perfectly inside parents.
What they gave up
- Perfect parent/child geometric nesting. Aggregating cells up one level is approximately correct, not exactly. For most marketplace and analytics queries this is fine; for some geographic computations it is a real foot-gun.
- Pentagons at the seams. Twelve cells globally are pentagons, not hexagons, with different neighbor counts. Code that assumes "every cell has six neighbors" silently breaks at those cells. H3 documents this; not everyone reads documentation.
- Compatibility with the rest of the geospatial ecosystem. PostGIS, S2, geohash, and most third-party tooling do not natively understand H3 cell identifiers. Uber paid an integration tax across their data platform.
- Industry familiarity. H3 is now well known, but in 2016 it was a new vocabulary every analyst and data scientist had to learn.
How it played out
H3 became a real piece of geospatial infrastructure beyond Uber. Major adopters include analytics platforms, ride-sharing competitors, mapping companies, and the broader location-intelligence industry. The library has bindings in Java, JavaScript, Python, Go, R, and more, with a C core. It has stayed actively maintained as an open-source project since 2018.
Inside Uber, the documented use cases include surge pricing computation ("measuring supply and demand in hexagons in each city"), marketplace optimization, and spatial analytics across the data platform. Surge pricing, in particular, is a hexagon-shaped computation: for each cell, compute supply and demand within the cell and a ring of neighbors, and adjust the multiplier accordingly. The uniform-neighbor property makes that math clean.
The 16 resolutions have proven to be the right number. Resolution 9 hexagons are roughly the size of a city block; resolution 6 cells are roughly a neighborhood; resolution 0 covers continents. Most marketplace queries operate at resolutions 7 to 9.
Where it ties to this bank's patterns
- [[geospatial-indexing]]: the broader topic, including geohash, S2, and quadkeys.
- [[hierarchical-aggregation]]: the property that lets the same index serve both detailed and summary queries.
- [[marketplace-supply-demand]]: the business pattern this index is shaped for.
- [[hash-based-partitioning]] versus spatial partitioning: H3 cell ID is a useful partition key for sharding geographically.
- Problem links: ride-sharing dispatch, delivery zone routing, geographic feature engineering for ML.
What a candidate should take away
- Choose primitives that make the math you do most often cheap. Uber averages over neighbors all day. Hexagons make that math arithmetic, not geometry.
- Hierarchy matters more than absolute resolution. A grid you can roll up is a grid you can use at multiple zoom levels.
- Topology has limits. You cannot tile a sphere with hexagons alone. Acknowledge the seams and put them where they hurt least.
- Cell IDs make great partition keys. Spatially close data ends up on the same shard with a hexagonal index; range queries become cheap.
- Build the vocabulary your business actually uses. Surge zones are not rectangles in any human's head; they are roughly circular. Hexagons approximate circles better than squares do.
What an AI agent would not have got right
- An AI prompted to "design a geospatial index for ride-sharing" will almost certainly suggest geohash or S2, because those are the dominant terms in long-form blog text. It will not invent hexagons unsolicited.
- It will miss the marketplace-math reason for hexagons. The pitch will be aesthetic ("hexagons are cooler") rather than functional (uniform neighbor distance simplifies averaging).
- It will not flag the pentagon problem. The first draft will say "every cell has six neighbors," which is wrong at twelve global locations.
- It will overstate parent/child nesting precision. "Each resolution divides cleanly into seven smaller cells" is approximately true, and the gap matters for any code that assumes exact containment.
- It will not advise putting cell IDs in your warehouse schema as a first-class column. The leverage of "cell as a join key" is the kind of architectural decision that pays off across the whole data platform but does not surface in code-completion-style AI advice.
Sources
- Uber Engineering blog, "H3: Uber's Hexagonal Hierarchical Spatial Index" (2018): https://www.uber.com/blog/h3/
- H3 documentation: https://h3geo.org/docs/