perf(graph): warm schema cache on startup to kill cold-start spikes #843

2026-05-21T15:11:36Z

argoyle commented

2026-05-21 15:11:36 +00:00

Following the schema cache PR (#841), warm pods serve from cache. New pods start cold: the first LatestSchema query per (orgId, ref) runs wgc router compose (100-300m CPU). That keeps tripping the HPA into TooManyReplicas — HPA scales up → new cold pod → wgc spike → scales up more. Production observed pods cycling 2→4→2→4 with cpu 1%/60%, fresh pods showing 2 cold computes within their first minute.

Fix: add Cache.AllOrgRefs() and Resolver.WarmCache(ctx). Service startup calls WarmCache right after the Resolver is wired, before the HTTP server accepts traffic. The first LatestSchema query a pod receives is now a cache hit. Pod start time grows by roughly N x cosmo-compose latency.

Following the schema cache PR (#841), warm pods serve from cache. New pods start cold: the first LatestSchema query per (orgId, ref) runs wgc router compose (100-300m CPU). That keeps tripping the HPA into TooManyReplicas — HPA scales up → new cold pod → wgc spike → scales up more. Production observed pods cycling 2→4→2→4 with cpu 1%/60%, fresh pods showing 2 cold computes within their first minute. Fix: add Cache.AllOrgRefs() and Resolver.WarmCache(ctx). Service startup calls WarmCache right after the Resolver is wired, before the HTTP server accepts traffic. The first LatestSchema query a pod receives is now a cache hit. Pod start time grows by roughly N x cosmo-compose latency.

argoyle added 1 commit 2026-05-21 15:11:37 +00:00

perf(graph): warm schema cache on startup to kill cold-start spikes

schemas / vulnerabilities (pull_request) Successful in 2m11s

Details

schemas / check-release (pull_request) Successful in 3m8s

Details

schemas / check (pull_request) Successful in 3m30s

Details

pre-commit / pre-commit (pull_request) Successful in 7m13s

Details

schemas / build (pull_request) Successful in 6m30s

Details

schemas / deploy-prod (pull_request) Has been skipped

Details

1549538c70

Following the schema cache PR, warm pods serve from cache (~24/25 hits
on a long-running pod). New pods, however, start cold: the first
LatestSchema query per (orgId, ref) still runs the wgc router compose
subprocess, which costs 100-300m CPU per call.

That cold-start cost is what kept tripping the HPA into TooManyReplicas:
HPA scales up → new pod added → new pod runs wgc on first query →
metrics spike → HPA scales up further → cycle repeats. Even after the
caching PR landed, observed pods cycling 2→4→2→4 in production, with
fresh pods showing 2 'Fetching latest schema' (cold) entries and 0
cache hits within their first minute.

Add Cache.AllOrgRefs() exposing every tracked (orgId, ref) pair, and
Resolver.WarmCache(ctx) which iterates them after the event-sourced
caches have been populated. For each ref it fetches the subgraphs,
runs sdlmerge, runs CosmoGenerator.Generate, and stores both results
in the cache. Errors per ref are logged and skipped so a single bad
ref does not block warming the rest.

Service startup calls WarmCache right after the Resolver is wired,
before the HTTP server starts accepting traffic, so the first
LatestSchema query a pod receives is already a cache hit.

argoyle scheduled this pull request to auto merge when all checks succeed 2026-05-21 15:11:51 +00:00

argoyle merged commit 4e50a051d0 into main

2026-05-21 15:25:51 +00:00

argoyle deleted branch warm-schema-cache-on-startup

2026-05-21 15:25:52 +00:00

argoyle referenced this pull request

2026-05-21 16:56:23 +00:00

fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA #844

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: unboundsoftware/schemas#843