What 429s taught me about product design

I wanted to build a ski trip finder that searches flexible date windows. You pick “sometime in the next 2 weeks” and the app finds the best combination of snow conditions and flight prices.

The math was brutal. 3 origin airports × 15 destinations × 7 departure dates × 3 trip durations = 945 API calls per search. Within an hour of the first working prototype, my logs were full of 429s.

The math problem

Flight search APIs aren’t built for this use case. They’re optimized for “show me JFK to Denver on January 15th” — one request, one response. I was asking for “show me the best of 945 possibilities.”

The rate limits hit immediately:

Amadeus error for LGA->DEN: [429]
Amadeus error for EWR->DEN: [429]
Amadeus error for LGA->DEN: [429]

Worse: if a user clicked search twice by accident, the second batch of requests would stack on top of the first and blow through the rate limit entirely.

Firefighting

The first fix was a “cheapest dates” API endpoint that returned the best prices across a date range in a single call. One request per route instead of 21 for each date combination. API calls dropped from 630 to 30 for a typical search.

But the endpoint didn’t work for many routes. Some returned 500 errors, others just timed out. I built a blocklist that automatically tracked failing routes and fell back to sampling for those.

That helped. But I was still hitting rate limits on popular routes, still seeing 429s pile up during peak hours.

The pivot

After a week of patching around unreliable responses, I switched providers entirely. The new provider charged per-search instead of rate-limiting — more predictable, more reliable. But now every API call cost money.

$0.005 per search doesn’t sound like much. But at 465 calls per user search, that’s $2.33 every time someone hits the search button. At even modest traffic, that adds up fast.

The question that changed everything: what if users hit cache instead of the API?

With a shared cache and 1-hour TTL, the first user pays $2.33. Users 2-10 hit warm cache — $0. Effective cost: $0.23 per search. At 70% cache hit rate, monthly costs at 1,000 users drop from $2,330 to under $700.

Caching made the per-search pricing viable. But I still needed to reduce that cold-cache cost.

The inversion

Snow conditions change slowly — daily at most, really just when storms roll through. Flight prices change constantly, multiple times per day. I’d been treating them as equally expensive to fetch.

The insight: use the stable data to filter before hitting the volatile data.

Instead of searching flights to all 30+ destinations, I could score them by snow conditions first. Why show flights to a resort with no new snow and a thin base? Users don’t want to see those results anyway.

This led to a three-phase search:

Phase 1: Snow triage. Fetch snow data for all destinations. Score each one. Filter to destinations above a snow threshold. The threshold is set at 50 — deliberately permissive, since “good powder” is around 70. I’d rather show more options than hide something worth considering.

Phase 2: Cache scout. Check the cache for any existing price data on the remaining destinations. Rank by a combination of snow score and cached price. This costs nothing — it’s just reading from the database.

Phase 3: Expand. Search flights only for the top 8 destinations. Use strategic date sampling — early, mid, and late in the window — instead of every possible date.

Result: 465 API calls dropped to 24-34. Cost per search went from $2.33 to $0.45.

The 8-destination limit was gut feeling based on how many resorts a user would realistically compare. (The hub-based destination model helped here — users think in hub regions, not individual airports.) The “always show at least 8 even if below threshold” logic was defensive design — I anticipated the edge case of a dry spell where nothing passes the snow filter. Empty results would be worse than showing imperfect options.

The architecture

With per-search costs manageable, I could build infrastructure around it.

Cache warming. A cron job runs every 6 hours, pre-populating the cache for top routes. It pulls from two sources: search analytics (what users actually searched in the past 7 days) and alert subscribers (what people are actively monitoring). The warming script respects the 30 req/min rate limit with 2-second delays between requests.

The cache key bug. For a few days, the warming script ran successfully but searches still felt slow. Routes that should have been warm weren’t hitting cache.

The problem: the warming script cached flights with single-origin keys (JFK:DEN:2026-01-20). The frontend queried with combined-origin keys (EWR,JFK,LGA:DEN:2026-01-20). The keys never matched.

I discovered this while dogfooding — running searches myself and noticing 3+ second loads for routes that should have been instant. Looking at the cache, the entries were there — just under different keys.

The fix: store single-origin keys, merge results at read time. Cache warming populates JFK:DEN, EWR:DEN, LGA:DEN separately. When the frontend asks for all three origins, the cache service fetches and combines them.

Streaming partial results. With caching in place, some destinations load instantly while others need fresh API calls. Instead of waiting for everything, the frontend shows cached results immediately and fills in the rest as they arrive. The UI went from “wait 8 seconds, see everything” to “see 3 destinations in 200ms, then 2 more every second.” (This was one of the main reasons I moved away from Streamlit — it couldn’t handle per-destination streaming updates.)

What changed

Each constraint forced a better product:

Rate limits forced a smarter algorithm. Brute-forcing 945 combinations was lazy. The three-phase approach with snow filtering is actually more useful — it surfaces good options instead of burying them in a wall of data.

API costs forced snow-first filtering. This turned out to be better UX anyway. Why show a $300 flight to a resort with no snow? The constraint aligned with what users actually want.

Cache complexity forced streaming updates. Waiting for everything to load felt broken. Showing partial results feels fast, even when the total time is similar.

The final architecture — snow triage, strategic sampling, cache warming, streaming results — handles the same flexible date search that hammered the API on day one. A search that used to make 630 API calls and hit rate limits now makes 24-34 calls, mostly from warm cache.

I didn’t plan any of this. The 429s taught me.

---
If you enjoyed this post, you can subscribe here to get updates on new content. I don't spam and you can unsubscribe at any time.