Scaling groups
Scaling groups let a workspace maintain a target fleet size of hosts automatically. They watch CPU and memory signals (and related reconciliation state), compare them to thresholds you configure, and add or remove servers within min/max bounds—typically on a roughly per-minute cadence in standard scheduler configurations.
This page is for SRE, FinOps, and platform buyers who need to understand behavior, risks, and governance—not just field definitions.
When scaling groups shine
- Variable traffic services where horizontal scale on VMs is cheaper than always-on peak capacity.
- Batch or CI fleets that need to burst during business hours.
- Kubernetes worker pools where node count should follow utilization without manual ticketing.
When to be cautious
- Stateful workloads with local disks you cannot afford to lose—downscale policies may terminate the newest nodes first in default templates (last-in-first-out style).
- Licence-bound software that charges per socket or per host—autoscale can surprise cost if max is too high.
- Cold-start sensitive apps: adding nodes does not instantly add capacity if image pulls or bootstrap is slow.
How decisions are made (conceptual)
- Metrics are collected from the fleet (CPU/memory snapshots; scope may be per group or whole workspace depending on configuration).
- Thresholds define when to scale out (high utilization) and scale in (sustained low utilization).
- Cooldown windows reduce flapping when metrics oscillate.
- Min/max caps prevent runaway growth or accidental shrink to zero.
- Reconciliation turns the desired count into provider calls that create or remove hosts using your compute profile.
Configuration surface (reference)
- Scope - metrics can target the group alone or the whole workspace fleet, depending on how the group was defined.
- Thresholds - CPU and memory upscale and downscale percentages (defaults on the order of high-90s upscale and ~50% downscale are common in templates).
- Sizing - min / max host counts, compute profile binding, cooldown between scale operations, victim selection on downscale (often last-in-first-out style), optional keep windows, and mode (e.g. static sizing).
- Reconciliation - rolling metric aggregation, scheduler interval, intended vs observed size, timestamps for the last scale event.
- Live view - current member count and latest metrics snapshot the scheduler used.
- Membership - hosts attached to the group, maintained by automation.
Provider resolution
Creating hosts through a scaling group may claim from operator-managed compute pools (see Compute provider model). If your workspace lacks eligible provider capacity, scale-out stalls with capacity-style errors—this is an operator fix (more pool, different region, or higher quota).
Permissions and audit
Scaling group APIs are workspace-scoped capabilities: creating a group, editing thresholds, and forcing scale actions should be RBAC-gated separately from read-only monitoring roles.
For enterprise patterns (SoD, break-glass, audit expectations), see Auditing and fine-grained access.
Operational guidance
- Tune thresholds to avoid flapping on bursty workloads; use longer smoothing windows if metrics are noisy.
- Keep max fleet size aligned with budget and quota agreements; pair with cost alerts where available.
- Remember downscale often uses last-in-first-out host selection in default templates - verify whether that matches your stateful workload policy.
- Test scale-out and scale-in in staging with realistic traffic; synthetic tiny tests miss provider timeouts and image pull delays.