Even with the schema cache and startup warmup, residual CPU bursts
(metrics-server occasionally samples a request mid-flight) were enough
to trip a brief scale-up, after which the default 5min scaleDown
stabilization pinned the deployment at maxReplicas long after the
spike had subsided.
Tune both directions:
- scaleUp.stabilizationWindowSeconds: 120 — a transient spike must
persist for two consecutive minutes before any pod is added.
Brief metric anomalies no longer move replicas.
- scaleUp policy: add at most 1 pod per 60s. Smooths reaction.
- scaleDown.stabilizationWindowSeconds: 120 (default 300) — once
the workload calms, return to minReplicas faster.
- scaleDown policy: remove at most 1 pod per 60s. Avoids
thundering-herd scale-down.