perf(graph): cache merged SDL and SchemaUpdate per ref #841

Merged
argoyle merged 1 commits from precompute-schema-cache into main 2026-05-19 07:51:51 +00:00
Owner

Both Supergraph and LatestSchema resolvers were recomputing their result on every request.

Why this matters

  • Supergraph: sdlmerge.MergeSDLs() runs AST validation + normalization + custom merge walkers over all subgraph SDLs.
  • LatestSchema: CosmoGenerator.Generate() shells out to wgc router compose (Node via npx) — spends 100-300m CPU per call.

The output is fully determined by the set of subgraph SDLs and their lastUpdate timestamp, so it can be cached and reused across requests until a SubGraphUpdated event bumps lastUpdate for the (orgId, ref) key.

What this PR does

Adds two precomputation caches to cache.Cache, both versioned by the existing lastUpdate map so a single timestamp comparison invalidates stale entries implicitly:

  • mergedSDLs — cached MergeSDLs output for Supergraph
  • schemaUpdates — cached SchemaUpdate (subgraphs + cosmo config) for LatestSchema

The UpdateSubGraph debounce already computes the cosmo config to publish through PubSub; it now also stores the SchemaUpdate so the next LatestSchema query is warm. OrganizationRemoved evicts both caches alongside lastUpdate.

Expected effect

Eliminates the per-request CPU bursts that were tripping the HPA into TooManyReplicas territory (the symptom that drove #840). After the first request per (orgId, ref) post-update, subsequent queries return in microseconds from the in-memory cache.

Both `Supergraph` and `LatestSchema` resolvers were recomputing their result on every request. ## Why this matters - `Supergraph`: `sdlmerge.MergeSDLs()` runs AST validation + normalization + custom merge walkers over all subgraph SDLs. - `LatestSchema`: `CosmoGenerator.Generate()` shells out to `wgc router compose` (Node via npx) — spends 100-300m CPU per call. The output is fully determined by the set of subgraph SDLs and their `lastUpdate` timestamp, so it can be cached and reused across requests until a `SubGraphUpdated` event bumps `lastUpdate` for the `(orgId, ref)` key. ## What this PR does Adds two precomputation caches to `cache.Cache`, both versioned by the existing `lastUpdate` map so a single timestamp comparison invalidates stale entries implicitly: - `mergedSDLs` — cached `MergeSDLs` output for `Supergraph` - `schemaUpdates` — cached `SchemaUpdate` (subgraphs + cosmo config) for `LatestSchema` The `UpdateSubGraph` debounce already computes the cosmo config to publish through PubSub; it now also stores the `SchemaUpdate` so the next `LatestSchema` query is warm. `OrganizationRemoved` evicts both caches alongside `lastUpdate`. ## Expected effect Eliminates the per-request CPU bursts that were tripping the HPA into `TooManyReplicas` territory (the symptom that drove [#840](https://gitea.unbound.se/unboundsoftware/schemas/pulls/840)). After the first request per `(orgId, ref)` post-update, subsequent queries return in microseconds from the in-memory cache.
argoyle added 1 commit 2026-05-19 07:38:24 +00:00
perf(graph): cache merged SDL and SchemaUpdate per ref
schemas / vulnerabilities (pull_request) Successful in 2m8s
schemas / check (pull_request) Successful in 3m5s
schemas / check-release (pull_request) Successful in 5m14s
pre-commit / pre-commit (pull_request) Successful in 6m55s
schemas / build (pull_request) Successful in 5m44s
schemas / deploy-prod (pull_request) Has been skipped
d652c1e446
Both Supergraph and LatestSchema resolvers recomputed their result on
every request. The work is non-trivial:

- Supergraph: sdlmerge.MergeSDLs() runs AST validation + normalization
  + custom merge walkers over all subgraph SDLs.
- LatestSchema: CosmoGenerator.Generate() shells out to wgc router
  compose (Node via npx), spending 100-300m CPU per call.

Because the output is fully determined by the set of subgraph SDLs and
their lastUpdate timestamp, the result can be cached and reused across
requests until a SubGraphUpdated event bumps the lastUpdate for the
(orgId, ref) key.

Add two precomputation caches to cache.Cache, both versioned by the
existing lastUpdate map so a single timestamp comparison invalidates
stale entries implicitly:

- mergedSDLs: cached MergeSDLs output for Supergraph
- schemaUpdates: cached SchemaUpdate (subgraphs + cosmo config) for
  LatestSchema

The UpdateSubGraph debounce already computes the cosmo config to
publish through PubSub; it now also stores the SchemaUpdate so the
next LatestSchema query is warm. OrganizationRemoved evicts both
caches alongside lastUpdate.

This eliminates the per-request CPU bursts that were tripping the
HPA into TooManyReplicas territory.
argoyle scheduled this pull request to auto merge when all checks succeed 2026-05-19 07:38:39 +00:00
argoyle merged commit 39cf6fbb8c into main 2026-05-19 07:51:51 +00:00
argoyle deleted branch precompute-schema-cache 2026-05-19 07:51:53 +00:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: unboundsoftware/schemas#841