fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA #844

Merged

argoyle merged 1 commits from tune-hpa-behavior into main

2026-05-21 17:10:19 +00:00

Author	SHA1	Message	Date
argoyle	eefab471d6	fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA schemas / vulnerabilities (pull_request) Successful in 2m2s Details schemas / check (pull_request) Successful in 2m42s Details schemas / check-release (pull_request) Successful in 2m59s Details pre-commit / pre-commit (pull_request) Successful in 6m18s Details schemas / build (pull_request) Successful in 5m47s Details schemas / deploy-prod (pull_request) Has been skipped Details Even with the schema cache and startup warmup, residual CPU bursts (metrics-server occasionally samples a request mid-flight) were enough to trip a brief scale-up, after which the default 5min scaleDown stabilization pinned the deployment at maxReplicas long after the spike had subsided. Tune both directions: - scaleUp.stabilizationWindowSeconds: 120 — a transient spike must persist for two consecutive minutes before any pod is added. Brief metric anomalies no longer move replicas. - scaleUp policy: add at most 1 pod per 60s. Smooths reaction. - scaleDown.stabilizationWindowSeconds: 120 (default 300) — once the workload calms, return to minReplicas faster. - scaleDown policy: remove at most 1 pod per 60s. Avoids thundering-herd scale-down.	2026-05-21 18:55:51 +02:00

Author

SHA1

Message

Date

argoyle

eefab471d6

fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA

schemas / vulnerabilities (pull_request) Successful in 2m2s

Details

schemas / check (pull_request) Successful in 2m42s

Details

schemas / check-release (pull_request) Successful in 2m59s

Details

pre-commit / pre-commit (pull_request) Successful in 6m18s

Details

schemas / build (pull_request) Successful in 5m47s

Details

schemas / deploy-prod (pull_request) Has been skipped

Details

Even with the schema cache and startup warmup, residual CPU bursts
(metrics-server occasionally samples a request mid-flight) were enough
to trip a brief scale-up, after which the default 5min scaleDown
stabilization pinned the deployment at maxReplicas long after the
spike had subsided.

Tune both directions:

- scaleUp.stabilizationWindowSeconds: 120 — a transient spike must
  persist for two consecutive minutes before any pod is added.
  Brief metric anomalies no longer move replicas.
- scaleUp policy: add at most 1 pod per 60s. Smooths reaction.
- scaleDown.stabilizationWindowSeconds: 120 (default 300) — once
  the workload calms, return to minReplicas faster.
- scaleDown policy: remove at most 1 pod per 60s. Avoids
  thundering-herd scale-down.

2026-05-21 18:55:51 +02:00

fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA #844

1 Commits