fix(k8s): add scaleUp/scaleDown stabilization to schemas HPA

Even with the schema cache and startup warmup, residual CPU bursts (metrics-server occasionally samples a request mid-flight) were enough to trip a brief scale-up, after which the default 5min scaleDown stabilization pinned the deployment at maxReplicas long after the spike had subsided. Tune both directions: - scaleUp.stabilizationWindowSeconds: 120 — a transient spike must persist for two consecutive minutes before any pod is added. Brief metric anomalies no longer move replicas. - scaleUp policy: add at most 1 pod per 60s. Smooths reaction. - scaleDown.stabilizationWindowSeconds: 120 (default 300) — once the workload calms, return to minReplicas faster. - scaleDown policy: remove at most 1 pod per 60s. Avoids thundering-herd scale-down.
2026-05-21 18:55:51 +02:00
parent 4e50a051d0
commit eefab471d6
1 changed files with 20 additions and 0 deletions
@@ -18,3 +18,23 @@ spec:
      target:
        type: Utilization
        averageUtilization: 60
+  behavior:
+    scaleUp:
+      # Wait 2min of sustained high CPU before scaling up. Schemas is
+      # event-driven and the per-request work is bursty even with the
+      # cache + warmup, so single spikes shouldn't pull replicas up.
+      stabilizationWindowSeconds: 120
+      policies:
+      - type: Pods
+        value: 1
+        periodSeconds: 60
+    scaleDown:
+      # Default 300s window kept pods pinned at maxReplicas long after
+      # the triggering spike had subsided. 120s is long enough to avoid
+      # flapping but lets the deployment return to minReplicas quickly
+      # once the workload calms.
+      stabilizationWindowSeconds: 120
+      policies:
+      - type: Pods
+        value: 1
+        periodSeconds: 60