perf(graph): cache merged SDL and SchemaUpdate per ref #841

Merged
argoyle merged 1 commits from precompute-schema-cache into main 2026-05-19 07:51:51 +00:00

1 Commits

Author SHA1 Message Date
argoyle d652c1e446 perf(graph): cache merged SDL and SchemaUpdate per ref
schemas / vulnerabilities (pull_request) Successful in 2m8s
schemas / check (pull_request) Successful in 3m5s
schemas / check-release (pull_request) Successful in 5m14s
pre-commit / pre-commit (pull_request) Successful in 6m55s
schemas / build (pull_request) Successful in 5m44s
schemas / deploy-prod (pull_request) Has been skipped
Both Supergraph and LatestSchema resolvers recomputed their result on
every request. The work is non-trivial:

- Supergraph: sdlmerge.MergeSDLs() runs AST validation + normalization
  + custom merge walkers over all subgraph SDLs.
- LatestSchema: CosmoGenerator.Generate() shells out to wgc router
  compose (Node via npx), spending 100-300m CPU per call.

Because the output is fully determined by the set of subgraph SDLs and
their lastUpdate timestamp, the result can be cached and reused across
requests until a SubGraphUpdated event bumps the lastUpdate for the
(orgId, ref) key.

Add two precomputation caches to cache.Cache, both versioned by the
existing lastUpdate map so a single timestamp comparison invalidates
stale entries implicitly:

- mergedSDLs: cached MergeSDLs output for Supergraph
- schemaUpdates: cached SchemaUpdate (subgraphs + cosmo config) for
  LatestSchema

The UpdateSubGraph debounce already computes the cosmo config to
publish through PubSub; it now also stores the SchemaUpdate so the
next LatestSchema query is warm. OrganizationRemoved evicts both
caches alongside lastUpdate.

This eliminates the per-request CPU bursts that were tripping the
HPA into TooManyReplicas territory.
2026-05-19 09:37:43 +02:00