perf(graph): cache merged SDL and SchemaUpdate per ref #841

2026-05-19T07:38:22Z

argoyle commented

2026-05-19 07:38:22 +00:00

Both Supergraph and LatestSchema resolvers were recomputing their result on every request.

Why this matters

Supergraph: sdlmerge.MergeSDLs() runs AST validation + normalization + custom merge walkers over all subgraph SDLs.
LatestSchema: CosmoGenerator.Generate() shells out to wgc router compose (Node via npx) — spends 100-300m CPU per call.

The output is fully determined by the set of subgraph SDLs and their lastUpdate timestamp, so it can be cached and reused across requests until a SubGraphUpdated event bumps lastUpdate for the (orgId, ref) key.

What this PR does

Adds two precomputation caches to cache.Cache, both versioned by the existing lastUpdate map so a single timestamp comparison invalidates stale entries implicitly:

mergedSDLs — cached MergeSDLs output for Supergraph
schemaUpdates — cached SchemaUpdate (subgraphs + cosmo config) for LatestSchema

The UpdateSubGraph debounce already computes the cosmo config to publish through PubSub; it now also stores the SchemaUpdate so the next LatestSchema query is warm. OrganizationRemoved evicts both caches alongside lastUpdate.

Expected effect

Eliminates the per-request CPU bursts that were tripping the HPA into TooManyReplicas territory (the symptom that drove #840). After the first request per (orgId, ref) post-update, subsequent queries return in microseconds from the in-memory cache.

Both `Supergraph` and `LatestSchema` resolvers were recomputing their result on every request. ## Why this matters - `Supergraph`: `sdlmerge.MergeSDLs()` runs AST validation + normalization + custom merge walkers over all subgraph SDLs. - `LatestSchema`: `CosmoGenerator.Generate()` shells out to `wgc router compose` (Node via npx) — spends 100-300m CPU per call. The output is fully determined by the set of subgraph SDLs and their `lastUpdate` timestamp, so it can be cached and reused across requests until a `SubGraphUpdated` event bumps `lastUpdate` for the `(orgId, ref)` key. ## What this PR does Adds two precomputation caches to `cache.Cache`, both versioned by the existing `lastUpdate` map so a single timestamp comparison invalidates stale entries implicitly: - `mergedSDLs` — cached `MergeSDLs` output for `Supergraph` - `schemaUpdates` — cached `SchemaUpdate` (subgraphs + cosmo config) for `LatestSchema` The `UpdateSubGraph` debounce already computes the cosmo config to publish through PubSub; it now also stores the `SchemaUpdate` so the next `LatestSchema` query is warm. `OrganizationRemoved` evicts both caches alongside `lastUpdate`. ## Expected effect Eliminates the per-request CPU bursts that were tripping the HPA into `TooManyReplicas` territory (the symptom that drove [#840](https://gitea.unbound.se/unboundsoftware/schemas/pulls/840)). After the first request per `(orgId, ref)` post-update, subsequent queries return in microseconds from the in-memory cache.

argoyle added 1 commit 2026-05-19 07:38:24 +00:00

perf(graph): cache merged SDL and SchemaUpdate per ref

schemas / vulnerabilities (pull_request) Successful in 2m8s

Details

schemas / check (pull_request) Successful in 3m5s

Details

schemas / check-release (pull_request) Successful in 5m14s

Details

pre-commit / pre-commit (pull_request) Successful in 6m55s

Details

schemas / build (pull_request) Successful in 5m44s

Details

schemas / deploy-prod (pull_request) Has been skipped

Details

d652c1e446

Both Supergraph and LatestSchema resolvers recomputed their result on
every request. The work is non-trivial:

- Supergraph: sdlmerge.MergeSDLs() runs AST validation + normalization
  + custom merge walkers over all subgraph SDLs.
- LatestSchema: CosmoGenerator.Generate() shells out to wgc router
  compose (Node via npx), spending 100-300m CPU per call.

Because the output is fully determined by the set of subgraph SDLs and
their lastUpdate timestamp, the result can be cached and reused across
requests until a SubGraphUpdated event bumps the lastUpdate for the
(orgId, ref) key.

Add two precomputation caches to cache.Cache, both versioned by the
existing lastUpdate map so a single timestamp comparison invalidates
stale entries implicitly:

- mergedSDLs: cached MergeSDLs output for Supergraph
- schemaUpdates: cached SchemaUpdate (subgraphs + cosmo config) for
  LatestSchema

The UpdateSubGraph debounce already computes the cosmo config to
publish through PubSub; it now also stores the SchemaUpdate so the
next LatestSchema query is warm. OrganizationRemoved evicts both
caches alongside lastUpdate.

This eliminates the per-request CPU bursts that were tripping the
HPA into TooManyReplicas territory.

argoyle scheduled this pull request to auto merge when all checks succeed 2026-05-19 07:38:39 +00:00

argoyle merged commit 39cf6fbb8c into main

2026-05-19 07:51:51 +00:00

argoyle deleted branch precompute-schema-cache

2026-05-19 07:51:53 +00:00

argoyle referenced this pull request

2026-05-21 15:11:36 +00:00

perf(graph): warm schema cache on startup to kill cold-start spikes #843

argoyle referenced this pull request

2026-05-21 16:56:23 +00:00

fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA #844

Sign in to join this conversation.