fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA
schemas / vulnerabilities (pull_request) Successful in 2m2s
schemas / check (pull_request) Successful in 2m42s
schemas / check-release (pull_request) Successful in 2m59s
pre-commit / pre-commit (pull_request) Successful in 6m18s
schemas / build (pull_request) Successful in 5m47s
schemas / deploy-prod (pull_request) Has been skipped

Even with the schema cache and startup warmup, residual CPU bursts
(metrics-server occasionally samples a request mid-flight) were enough
to trip a brief scale-up, after which the default 5min scaleDown
stabilization pinned the deployment at maxReplicas long after the
spike had subsided.

Tune both directions:

- scaleUp.stabilizationWindowSeconds: 120 — a transient spike must
  persist for two consecutive minutes before any pod is added.
  Brief metric anomalies no longer move replicas.
- scaleUp policy: add at most 1 pod per 60s. Smooths reaction.
- scaleDown.stabilizationWindowSeconds: 120 (default 300) — once
  the workload calms, return to minReplicas faster.
- scaleDown policy: remove at most 1 pod per 60s. Avoids
  thundering-herd scale-down.
This commit is contained in:
2026-05-21 18:55:51 +02:00
parent 4e50a051d0
commit eefab471d6
+20
View File
@@ -18,3 +18,23 @@ spec:
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
# Wait 2min of sustained high CPU before scaling up. Schemas is
# event-driven and the per-request work is bursty even with the
# cache + warmup, so single spikes shouldn't pull replicas up.
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 1
periodSeconds: 60
scaleDown:
# Default 300s window kept pods pinned at maxReplicas long after
# the triggering spike had subsided. 120s is long enough to avoid
# flapping but lets the deployment return to minReplicas quickly
# once the workload calms.
stabilizationWindowSeconds: 120
policies:
- type: Pods
value: 1
periodSeconds: 60