AI System DesignStaffDemand ForecastingPricing & Control Loops

Design a Surge / Dynamic Pricing System for a Two-Sided Marketplace

A Staff-level pricing problem: set an economic price that clears a two-sided marketplace in real time, where the price you set changes the demand you're forecasting — a closed control loop, not an auction, aggregation, or assignment. The bar is reasoning about anti-oscillation guardrails, incentive-compatible surge (additive vs multiplicative), and causal evaluation under interference, not just standing up a Kafka→Flink→KV pipeline. Companies like Uber, Lyft, DoorDash, and Amazon Flex build exactly this and interview AI/ML and infra candidates on it.

Level: Staff
Category: AI System Design
Interview time: 60 min

100% free · No login required

WHAT THIS QUESTION TESTS

·Can you set a price that clears the market when the price itself changes demand (closed feedback loop), vs treating demand as fixed exogenous input?

·Do you stop the multiplier from oscillating — smoothing, hysteresis, rate limits, neighbor-cell pooling — instead of a raw demand/supply ratio that pumps?

·Do you quote a consistent, idempotent price with an explicit TTL, so the rider isn't surprised and the same request can't be re-priced on retry?

·Do you evaluate pricing changes with a design that survives marketplace interference (switchback / cluster randomization), not a naive user-split A/B test?

★ STAFF-LEVEL SIGNALS

★Frames the multiplier as a market-clearing / control problem with an explicit objective and constraints — not a heuristic ratio.

★Knows additive surge is more incentive-compatible than multiplicative (Garg & Nazerzadeh, Management Science 2021) and can say why short vs long trips differ.

★Reasons quantitatively about the feedback loop: how price elasticity + driver repositioning latency create oscillation, and how aggregation window / damping trade stability against responsiveness.

★Designs the causal eval and the guardrail/fallback story (capped multiplier, last-known-good, geo-fenced caps for regulation) as first-class, not afterthoughts.

Frame the problem & who's asking

Surge is not an auction, an aggregation, or an assignment — it’s a controller that sets a price to clear a two-sided market, and the price you set changes the demand you’re trying to forecast.

That one sentence is the whole interview. Most candidates pattern-match surge onto a system they already know and get the category wrong. Get the category right and the hard parts fall out naturally; get it wrong and you spend 60 minutes optimizing a Kafka pipeline while the interviewer waits for the economics.

What this is distinct from

The trap is reaching for a familiar shape. Three near-neighbors, and why none of them fit:

Ad serving is an auction — many bidders compete for one slot; the mechanism discovers a price from bids. Surge has no bids; the platform sets the price.
A click/impression counter is an aggregation — you count events and serve a number. Surge does involve counting, but the counting is the easy 80%.
Ride matching / dispatch is an assignment — bipartite matching of riders to drivers. That system runs downstream of pricing and we assume it works.

Surge is none of these. It is economic price-setting under closed-loop feedback with anti-oscillation guardrails. The price is an actuator, the marketplace is the plant, and demand is a function of the very price you emit.

The one-sentence problem statement

Per geo-cell, every few seconds, forecast short-horizon demand and supply and emit a surge multiplier that balances the marketplace, served at quote time as a stable, honored, idempotent price the rider can act on 90 seconds later.

The objective — state it out loud

Do not say “don’t have unserved riders.” Say:

Maximize expected completed-trip value (GMV / contribution margin) subject to guardrails — rider wait time, rider cancel rate, driver earnings stability, and regulatory caps.

The two words that earn the Staff signal are completed and subject to. We are not maximizing instantaneous revenue — that prices riders out, starves the funnel, and burns trust. We optimize the trips that actually clear, under constraints that keep both sides of the market healthy.

The crux, called out up front

The central difficulty, named before any architecture: price affects demand. Raise the multiplier and rider demand falls (rider elasticity) while driver supply rises (driver elasticity, with a lag). So the demand forecast that feeds the controller is conditional on the price the controller is about to set — a closed loop, not a thermostat reading a fixed setpoint. Every later decision (forecast features, control damping, causal eval) traces back to this loop.

Scope cuts

One product line (e.g. an economy ride tier); one city's granularity.
Assume a working matching/dispatch system downstream.
Out of scope: the assignment algorithm itself, payments rails, fraud/identity. We consume "a trip happened" and "a driver is available" as given.

Who asks this & what they probe

Role

What they own

What the interview probes

SDE

The real-time signal path

H3 cell aggregation through Kafka/Flink, sub-10ms multiplier lookup, idempotent/consistent quote that survives a 90s retry

MLE

The brains

Forecasting under feedback, price elasticity, the multiplier objective/reward, anti-oscillation in the loop, switchback causal eval under interference

Switcher (SDE to AI)

Bridge from serving to ML

Leads with the streaming/serving half they know, then must show why a raw ratio oscillates, why multiplicative surge isn't incentive-compatible, why a user-split A/B is biased

If you’re the switcher, this is a strong bridge question: you already own the streaming/serving half. Anchor there, then earn the AI signal on three specific claims — the feedback loop, additive vs multiplicative surge, and interference in A/B tests. Those three are the rest of this answer.

Requirements & SLOs

Functional

Ingest rider-request and driver-availability events, keyed per geo-cell.
Forecast demand and supply over a 5–15 min horizon per cell.
Compute a surge multiplier and write it to a low-latency store.
Serve the multiplier at quote time and freeze the quote with a TTL.
Log all features, multipliers, and outcomes for offline training and causal eval.

Non-functional

Stability over twitchiness: the multiplier must not oscillate. Smooth, bounded, predictable.
Quote consistency: a price the rider saw must be honored within its TTL.
Idempotency: a retried request returns the same quote, never a re-price.
Fail safe: on missing/stale data, drift toward 1.0x; never surge from absence of signal.

SLOs

SLO

Target

Multiplier available at quote time, end-to-end

p99 under 50 ms

Cached multiplier lookup itself

p99 under 5–10 ms

Surge state recompute cadence

every 2–5 s

Quote freeze TTL

1–5 min

Freshness vs stability

These two TTLs do different jobs. The multiplier is recomputed every few seconds (freshness) but smoothed (stability), and a quoted price is frozen 1–5 min so a rider walking to the curb isn’t re-priced mid-checkout. And a reality check on the loop’s time constant: real-world surge equilibration takes ~5 min (a mild rush in a dense city) to hours. The controller’s job is to set a price, not to instantly clear the market — that expectation governs how aggressively we’re allowed to react.

Capacity estimation

Numbers exist to size the hot path and to find the cost lever. Two orders of magnitude matter: events ingested per second, and the count of hot cells.

Geo-cells

Use H3, an open-source hexagonal hierarchical index originated at Uber. A practical working cell is roughly 0.1–0.5 sq mi (H3 res ~8 averages ~0.29 sq mi / ~0.74 km²). A large city is O(10^3–10^4) active cells; globally O(10^7) cells exist but only a fraction are hot at any instant — which is the central cost lever (Step 8).

Scale estimate

Quantity

Order of magnitude

Trips/day, large market

millions

Pricing decisions/sec, global peak

10^4 – 10^5

Rider+driver events ingested/sec

millions

Active H3 cells, large city

10^3 – 10^4

Surge-state refresh cadence

every 2–5 s

What the numbers imply

Ingest is millions of events/sec but cheaply shardable — every event carries an H3 cell, so it partitions perfectly with no cross-key coordination.
The hot read path is small and cacheable — a city is at most O(10^4) cells, so the entire live surge state of a market is a few hundred KB in a KV store. Reads dwarf writes.
The cost asymmetry is the headline — O(10^7) global cells but only a small hot fraction need full-cadence forecasting at any instant. Sizing full compute for all cells is the rookie over-spend; Step 8 spends only on hot cells.

API design

Two surfaces matter: the quote API the rider app calls on the hot path, and the forecast/control contract between the ML service and the controller. The hot path only ever reads a precomputed multiplier — it never forecasts inline.

Quote API (hot path)

The request carries an idempotency key so a retry — network drop, double-tap, app restart — returns the same frozen quote, never a re-price. Quote creation is idempotent on (rider, origin, dest, time-bucket).

POST /quote

Idempotency-Key: <key>

body: { rider_id, origin, dest, ts }

-> if key seen within TTL: return existing quote (200, cached)

-> else: compute, freeze, store under key, return (201)

The returned quote is a frozen object honored within its TTL (schema in Step 4).

Forecast / control contract (internal)

The forecasting service emits, per cell, a distributional demand/supply forecast — not a point estimate — so the controller can price for a chosen quantile under the guardrails.

GET /forecast?cell=<h3>&horizon=10m

-> {

cell, horizon,

demand: { p50, p90 }, # request count, quantiles

supply: { p50, p90 }, # serviceable drivers

price_response: { # conditional on price p

demand_at: f(p), # elasticity curve handle

supply_at: f(p)

confidence # widened for cold/sparse cells

}

The controller consumes this, solves for a clearing price, applies damping, and writes a single number per cell to the KV store. No external caller touches the forecast on the request path.

Data model

Three stores, each shaped by its access pattern: the hot multiplier KV, the frozen-quote store, and the append-only feature/outcome log.

Geo-cell as the partition key

Every event, aggregate, multiplier, and quote is keyed by H3 cell. Cells are independent — there is no cross-cell transaction on the hot path, which is what lets a hot market shard away from a cold one.

Hot multiplier store (geo-sharded KV, e.g. Redis)

One small record per active cell, overwritten every 2–5 s by the controller:

key: cell_id (H3)

value: {

multiplier, # smoothed, damped, capped

additive_surge, # flat amount (see Step 6)

updated_at,

source # live | last_known_good | default

}

Frozen-quote store

Written at quote time, read on retry, expired by TTL:

quote = {

quote_id,

idempotency_key,

cell_id,

multiplier,

fare, # base + time + distance (+) additive surge

created_at,

expires_at # created_at + TTL (1–5 min)

}

Within the TTL the rider is charged the quoted price even if live surge has moved — the consistency contract behind upfront pricing.

Feature / outcome log (offline)

Append-only, immutable, partitioned by time and cell. Every decision logs its inputs and what happened next so training and causal eval can reconstruct the counterfactual:

row = {

ts, cell_id,

features, # aggregates + external signals

forecast, # quantiles emitted

multiplier_set, # action taken

outcome # completed? wait? cancel? earnings?

}

This log is the substrate for elasticity estimation and switchback analysis (Step 7); without logging the action alongside the outcome, no causal read is possible later.

High-level architecture & data flow

Two planes. The hot serving plane is optimized for latency and availability; the offline plane does feature logging, training, elasticity estimation, and switchback analysis. Keep them decoupled so a model retrain can never threaten the serving SLO.

RIDER APP DRIVER APP

(open / request) (location / availability)

| |

v v

+------------------- KAFKA -------------------+

| (events keyed by H3 cell) |

+----------------------+----------------------+

FLINK windowed aggregation per cell

open demand | fulfilled | avail | en-route

FORECASTING SERVICE <--- external

quantile demand & supply, per cell signals

| (events, weather)

PRICING CONTROLLER

clearing price -> smoothing / hysteresis / caps

GEO-SHARDED KV (Redis) <- multiplier per cell

----- HOT PATH (p99 under 50 ms) ---+----------------

QUOTE SERVICE

fare = base + time + dist (+/-) surge ; freeze TTL

rider sees a frozen, idempotent quote

OFFLINE PLANE: feature/outcome logs -> training,

elasticity estimation, switchback experiment analysis

Ingest

Rider open-app/request events and driver location/availability events flow into Kafka, keyed by H3 cell so all of a cell’s signal lands on the same partition.

Real-time aggregation

Flink maintains windowed aggregates per cell over 1–5 min sliding windows: open demand, fulfilled demand, available supply, en-route supply. This mirrors the documented Kafka → Flink → KV surge pipeline pattern used at ride-hailing scale.

Forecasting service

Consumes the aggregates plus external signals and emits short-horizon demand and supply forecasts with uncertainty per cell (deep dive in Step 6).

Pricing controller

Turns forecasts into a target multiplier, then applies smoothing / hysteresis / rate-caps / hard-caps before writing to the KV store (Step 6).

Quote service (the SDE-heavy serving path)

At request time the quote service reads the cell’s current smoothed multiplier from the geo-sharded KV cache in under 10 ms p99, computes the fare, and freezes the quote with a TTL and idempotency key. It never recomputes the forecast on the request path — the controller has already written a fresh, damped number; the quote path only reads.

Charge-time vs quote-time — within TTL: honor the quoted price, full stop. On expiry: re-quote at the current multiplier and require explicit rider re-confirm. Never silently charge a higher live surge than what the rider accepted.
Geo-sharding — partition the multiplier store and aggregation by H3 cell / city so a hot market (a holiday spike in one city) cannot contend with another's reads.
Graceful degradation — the quote path must degrade, not fail. If the multiplier store is unreachable, serve last-known-good (Step 6) rather than erroring. A rider should always get a price; the only question is how fresh.

Deep dive — forecasting under feedback, incentive-compatible surge & control-loop stability

WHERE STAFF IS WON

This is where Staff is won. Three connected hard parts, each tracing back to the Step 0 crux that price affects demand: (A) forecasting conditional on price, (B) setting a market-clearing, incentive-compatible price, and (C) keeping the closed loop from oscillating.

A. Demand & supply forecasting under feedback

Targets. Forecast per cell over the 5–15 min horizon: demand = expected rider request count; supply = available + en-route + likely-to-reposition drivers. Forecast supply separately — drivers respond to price with a lag (they have to physically drive over), so supply is a slower, laggier signal than demand. Collapsing them into one model hides exactly the lag that later causes oscillation.

Model choices. Use probabilistic / quantile forecasts, not point estimates. Gradient-boosted trees (XGBoost) are a strong, cheap baseline; LSTM-style spatio-temporal nets capture sequence and neighborhood structure (the industry has published RNN/LSTM approaches for extreme-event demand forecasting). Quantiles matter because the controller can then price for P50 vs P90 demand — pricing to the median in a cell prone to spikes leaves riders stranded; pricing to P90 everywhere over-surges. The quantile is a knob the controller turns based on the guardrails.

Sparse-cell pooling. Most cells are low-volume. A cell with 2 requests this window must not produce a confident, noisy multiplier. Pool across H3 neighbors — the center hex plus its 6 immediate neighbors — and use hierarchical / partial pooling so sparse cells borrow strength from their neighborhood and the city while dense cells stay local.

External signals. The streaming window only sees what already happened. These let the forecast see a spike coming (preemptive surge):

Signal

Why it predicts a spike

Calendar / events

Concert or game lets out, a wall of simultaneous requests

Weather

Rain collapses walk/bike trips into ride demand

Surface-transit disruption

A subway outage dumps riders onto the platform

Time-of-day / day-of-week

Commute peaks, bar close, airport banks

Price as a feature — the feedback-loop crux. The forecast must be conditional on the price being set: high surge suppresses demand (rider elasticity) and attracts supply (driver elasticity, lagged). So either feed price in as a feature of the demand/supply model, or explicitly model the curves D(price) and S(price). Either way the output is not “demand will be X” but “demand will be X if I set price p.” Without this, the controller forecasts demand as if its own action didn’t move it — then chases a target its previous action already shifted, which is the seed of oscillation.

Cold-start / new cell. A brand-new or just-gone-cold cell has no local history. Fall back to a city-level or neighbor-pooled prior and widen the uncertainty band so the controller stays conservative and parks the multiplier near 1.0. Better to under-react in an unknown cell than to invent a surge from noise.

B. Market-clearing price & incentive-compatible surge

Market-clearing formulation. Do not map a ratio to a price. Choose the price p where expected demand and expected serviceable supply meet:

find p such that D(p) ~= S(p)

D(p) = rider demand curve (falls as p rises)

S(p) = serviceable supply (rises as p rises, lagged)

Surge exists to do two economic jobs: (1) ration scarce supply to the highest-value trips, and (2) pull more drivers in. A raw demand/supply ratio does neither on purpose — it just reacts.

Elasticity. Solving for the clearing price needs both curves: rider price elasticity (how much demand drops per +10% price) and driver supply elasticity (how much supply rises per +10% price). Both vary by city, time-of-day, and segment (airport vs downtown, commuter vs night-out). Elasticity is estimated offline from historical price variation and, ideally, from the switchback experiments themselves (Step 7), which give cleaner causal price variation than observational data.

Multiplicative vs additive surge — the Staff fact. Multiplicative surge multiplies the whole fare by, say, 1.8x. It is not incentive-compatible in a dynamic setting: it scales the per-mile rate, so it overpays long trips and underpays short ones relative to actual marginal scarcity, distorting which trips drivers accept and making driver pay volatile. Additive surge adds a flat surge amount on top of the normal fare (base × normal multiplier + flat surge). It is more incentive-compatible and gives drivers more stable, predictable earnings. This is the result from Garg & Nazerzadeh, Driver Surge Pricing, Management Science 2021, whose incentive-compatible mechanism is well-approximated by additive driver surge.

Multiplicative

Additive

Mechanism

Multiply whole fare

Flat amount added on top

Incentive compatibility

Low — distorts trip selection

High — closer to truthful

Driver earnings

Volatile, scales with trip length

Stabler, length-independent

Trip-length bias

Overpays long, underpays short

Removes length distortion

Status

Simple, legacy

Recommended; what to ship

The short-vs-long-trip story is the tell that you understand why: under multiplicative surge a driver rationally cherry-picks long trips during surge; additive surge removes that distortion.

Objective / RL framing. You can pose pricing two ways: as constrained optimization (maximize completed-trip value subject to wait-time and cancel-rate guardrails — solve for the clearing price each interval), or as a value function over marketplace state (a controller that accounts for how today’s price changes tomorrow’s supply distribution). RL has a documented production footprint in this space, but on the matching / marketplace-balance side: a DQN-style value-function approach to driver value has been deployed across 400+ cities. That is the matching/value-function deployment, not a deployed pricing controller — so mention RL, but don’t over-index. A well-tuned elasticity model plus a damped control loop is the shippable v1; RL-for-pricing is the v2 once you have the eval and guardrails to trust it.

Reward-design nuance. Optimize the difference in completed trips / conversion (and contribution margin), not raw instantaneous revenue. Maximizing instantaneous revenue prices riders out and starves the funnel — the model would happily set a high price, book one rich trip, and call it a win while ten riders abandon.

C. Anti-oscillation & control-loop stability

The job here is to connect control theory to marketplace economics and to treat guardrails as first-class design, not safety afterthoughts.

Why it oscillates. Walk the loop concretely:

1. Surge rises, price up.

2. Demand drops (rider elasticity) and drivers flock in (driver elasticity).

3. The next window sees oversupply.

4. Surge collapses toward 1.0x.

5. Demand returns and drivers leave.

6. Surge spikes again. Repeat.

The root cause is a timescale mismatch: driver-repositioning lag is minutes, but the recompute cadence is seconds. A controller reacting in seconds to an actuator that takes minutes to land is the textbook setup for pumping in a closed loop — always responding to the previous over-correction. Naming this — “the loop pumps because the actuator’s dead time exceeds the control interval” — is the single highest-signal sentence in the interview.

Damping toolkit. Each technique buys stability and spends responsiveness. Name that tradeoff every time.

Technique

What it does

Tradeoff

Temporal smoothing (EWMA / moving avg)

Filters high-freq jitter in the multiplier

Adds lag; slower to react to a real spike

Hysteresis (asymmetric thresholds)

Raise surge faster than you drop it (or vice-versa) to stop chatter

Sticky in one direction; can hold surge slightly too long

Per-step rate limits

Cap multiplier change per interval

Can't snap to a sudden, genuine shock

Spatial pooling (center + 6 neighbors)

One noisy cell can't whipsaw the price

Blurs a truly localized spike

Deadband near 1.0x

Ignore tiny imbalances

Tolerates mild, sub-threshold imbalance

Hysteresis is worth a sentence of depth: it’s a hysteretic comparator — separate raise and lower thresholds so the multiplier doesn’t dither across a single setpoint. You’d typically raise quickly (riders waiting is the urgent failure) and lower slowly (so a one-window dip in demand doesn’t yank the price down and re-trigger the spike). The meta-point: tune damping against the driver-repositioning time constant, not against the recompute cadence. The loop must be slower than the thing it’s trying to steer.

Caps & regulatory.

Hard multiplier cap — commonly ~2–3x in normal ops, configurable. This is a product/guardrail decision, not merely a safety rail; it bounds rider outrage and PR exposure.
Regulatory / price-gouging caps — geo-fenced rules that clamp or disable surge during declared emergencies/disasters. This is a compliance requirement and must be a first-class rule the controller cannot override — it sits above the optimization, not inside it.

Anti-gaming. Drivers can collude to log off simultaneously to manufacture artificial scarcity and trigger surge. Defend with anomaly detection on supply drops — implausibly fast, correlated availability collapses are flagged and smoothed/ignored rather than priced in. The smoothing and rate-caps above already blunt the worst of it; the anomaly check is the targeted layer.

Failure / fallback. Fail safe = cheap, never accidentally surge from missing data:

Pipeline stale / store outage: serve last-known-good multiplier with time-decay toward 1.0 — the longer we're blind, the closer to neutral we drift.
Cold / empty cell: default to 1.0x.
Stale forecasts: widen uncertainty, which pulls the controller toward neutral.

The asymmetry is deliberate: an erroneous high multiplier overcharges and enrages riders and regulators; an erroneous low one just costs some margin. When blind, bias low.

Rollout & causal evaluation under interference

The MLE-heavy rollout block. You cannot ship a pricing change by eyeballing it — the headline is that a naive user-split A/B test gives a biased read, and a Staff candidate must say why and what to do instead, then sequence the v1/v2 launch.

Why naive A/B is biased

Randomize riders into treatment and control, and both groups still compete for the same pool of drivers. Treatment surge pulls drivers toward treated riders, changing the experience of control riders — SUTVA is violated, treatment leaks into control, and the measured lift is biased by marketplace interference. This is the causal pitfall to name; everything else here is the remedy.

Switchback experiments

Randomize treatment by time interval across the whole market — flip the pricing algorithm every 30–60 min so that everyone in the market at a given moment sees one variant. There’s no cross-arm contamination through the shared driver pool because, at any instant, there’s only one arm. This is the standard interference-robust design at ride-hailing platforms (Bojinov & Simchi-Levi, Design and Analysis of Switchback Experiments; switchback testing is used at marketplaces including Lyft).

Cluster / region randomization

Alternatively, randomize by city or geo-cluster so treated and control markets don’t share supply at all (Airbnb’s cluster-randomization meta-experiment for pricing). Cleaner separation than switchback, but coarser units mean far fewer of them — variance and confounding across cities become the concern.

Carryover & variance

Switchback estimators have two costs to budget for:

Higher variance — precision scales with the number of switches, so you need long runs.
Carryover — a past surge interval changes the future supply distribution (drivers repositioned during treatment are still there in the next control interval). Mitigate with burn-in windows between switches that you exclude from analysis.

Debiasing

More advanced designs use shadow-price / two-sided methods to correct residual interference bias (Reducing Marketplace Interference Bias via Shadow Prices, Management Science). Cite it as the v2 sophistication, not the v1 default.

Metrics & offline gate

Online

Offline

Completed trips / conversion

Forecast calibration (quantile coverage)

Rider wait time

Elasticity backtest

Cancel rate

Simulated counterfactual P&L

Driver earnings & utilization

—

GMV / contribution margin

—

Simulate the counterfactual P&L of a new pricing policy before it touches a live switchback, so experiments validate already-plausible policies rather than burning slow switchback capacity on obviously-bad ones.

What I’d ship — v1 vs v2

Ship in v1:

Streaming H3 aggregation (Kafka → Flink → KV).
Calibrated quantile demand/supply forecast with neighbor pooling.
Elasticity-based clearing price with smoothing + hysteresis + rate caps + hard cap.
Additive surge.
Frozen, idempotent quotes (TTL 1–5 min, sub-10ms lookup).
Last-known-good fallback throughout.
Switchback evaluation.

Defer to v2:

Constrained-RL / value-function controller for pricing (the matching-side DQN deployment in 400+ cities is the precedent, not yet a pricing controller).
Per-segment elasticity.
Preemptive, event-driven surge.
Shadow-price debiased evaluation.

Bottlenecks, scaling & evolution

Bottlenecks

1. Aggregation/forecast hot path during city-wide spikes — the moment you most need fresh numbers is when the pipeline is most loaded.

2. Feedback-loop stability under sudden supply shocks — a stadium emptying tests the damping directly.

3. Eval throughput — switchbacks are slow (long runs, burn-in windows), so a backlog of pricing experiments forms. Experimentation, not compute, is often the real bottleneck on shipping pricing changes.

Observability

Per-cell time-series of multiplier, demand/supply forecast, and realized outcome — to see oscillation, not infer it after riders complain.
Oscillation detector — alert on multiplier sign-flips per unit time exceeding a threshold; that is the loop pumping.
Fallback-source ratio — what fraction of quotes served last_known_good or default; a rising ratio means the pipeline is blind somewhere.

Hot-market scaling

Geo-shard by H3/city; autoscale Flink and the forecasting service per market.
Pre-warm caches before known events (concerts, holidays) so the cold-start path never runs during the spike.
Isolate hot cells so one market's spike can't degrade another's SLO.

Cost lever

Most cells are quiet. Compute full-cadence forecasts only for hot cells; serve coarser/cached numbers for cold ones and reuse neighbor-pooled forecasts. This is where the O(10^7) global-cell count stops being scary — you only pay full price for the small hot fraction.

Closing judgment

The hard 20% is the feedback loop, incentive-compatible surge, and causal eval. Everything else is well-understood streaming and serving. Ship the simple, well-damped controller before reaching for RL.

✓

Summary

Through-line: surge is a controller that sets a market-clearing price under a closed feedback loop — not an auction, an aggregation, or an assignment.
Staff is won in exactly three places:

1. The price-affects-demand feedback loop and its anti-oscillation control (smoothing, hysteresis, rate caps, neighbor pooling — tuned against driver-repositioning lag).

2. Incentive-compatible surge — additive over multiplicative (Garg & Nazerzadeh, Management Science 2021), with the short-vs-long-trip reasoning.

3. Causal eval that survives marketplace interference — switchback / cluster randomization, carryover/variance budgeting, shadow-price debiasing.

Always-state objective: maximize completed-trip value subject to wait-time, cancel-rate, driver-earnings, and regulatory guardrails — never instantaneous revenue.
Concrete spine to remember: H3 cells (~0.1–0.5 sq mi) → Kafka → Flink windowed aggregation → quantile demand/supply forecast → elasticity-based clearing price → smoothing + hysteresis + rate-cap + hard-cap → geo-sharded KV → frozen idempotent quote (TTL 1–5 min, sub-10ms lookup) → switchback eval, with last-known-good fallback throughout.
Red flags to avoid: a raw demand/supply ratio with no damping; treating demand as price-independent; user-split A/B as the eval; and surging from missing/stale data instead of failing safe to 1.0x.

★

Rubric — Senior vs Staff

Dimension

Senior signal

Staff signal

Problem framing & objective

Treats surge as a demand/supply ratio: multiplier = f(demand/supply), clamped. Optimizes 'don't have unserved riders.'

Frames it as setting a market-clearing price under a closed feedback loop with an explicit constrained objective (e.g., maximize completed-trip value / GMV subject to a wait-time and rider-cancel guardrail), and states the price-affects-demand loop up front as the central difficulty.

Demand/supply forecasting

One demand model per cell (gradient-boosted or LSTM), point forecast, 5–15 min horizon.

Quantile / probabilistic short-horizon forecasts per H3 cell with spatial pooling for sparse cells; forecasts supply (available + en-route + repositioning) separately; uses external signals (events, weather, surface-transit failures) and treats price as a feature so the forecast is conditional on the price being set.

Pricing / control logic

Maps imbalance to a multiplier with a lookup table or sigmoid; caps at e.g. 3x.

Models elasticity on both sides, picks a price that equilibrates expected demand and supply, and makes it incentive-compatible (additive surge over multiplicative — Garg & Nazerzadeh 2021) so short vs long trips and driver earnings stay sane; may frame as constrained RL / value-function over marketplace state.

Anti-oscillation & stability

Mentions smoothing or a moving average.

Explicitly engineers the control loop: EWMA/temporal smoothing, hysteresis (asymmetric raise/lower thresholds), per-step rate limits, neighbor-cell pooling, and damping tuned against driver-repositioning latency; reasons about why a tight loop pumps and names the stability-vs-responsiveness tradeoff.

Serving, latency & quote consistency

Reads a multiplier from Redis at quote time; recomputes per request.

Sub-10ms geo-sharded multiplier lookup; freezes the quote with a TTL (e.g. 1–5 min) and an idempotency key so retries/re-requests don't re-price; honors the quoted price within TTL even as the live surge moves; defines the consistency contract between quote-time and charge-time.

Experimentation & causal eval

A/B test: split riders into treatment/control, compare GMV.

Recognizes user-split A/B is biased by marketplace interference (treated and control share the same drivers); uses switchback (time-randomized) and/or cluster/region randomization, reasons about carryover and variance, and may cite shadow-price / two-sided designs to debias.

Guardrails, fairness & failure modes

Caps the multiplier; mentions a default of 1.0 on failure.

Last-known-good multiplier + decay on pipeline failure; regulatory/price-gouging caps that are geo-fenced (e.g., disaster states); anti-gaming against drivers inducing artificial surge; fairness/PR guardrails; clear behavior when forecasts are stale or a cell goes cold.

Staff signals / depth of tradeoffs

Covers the happy path end to end.

Drives the conversation to the 2-3 places where it's genuinely hard (the feedback loop, incentive compatibility, causal eval), quantifies tradeoffs with defensible numbers, and is explicit about what they'd ship v1 vs defer.

★ MORE WALKTHROUGHS

Want more breakdowns like this?

Join free early access for upcoming RAG, LLM eval, agents, and AI infrastructure walkthroughs.

Join Free Early Access →