Scaling groups

Scaling groups let a workspace maintain a target fleet size of hosts automatically. They watch CPU and memory signals (and related reconciliation state), compare them to thresholds you configure, and add or remove servers within min/max bounds—typically on a roughly per-minute cadence in standard scheduler configurations.

This page is for SRE, FinOps, and platform buyers who need to understand behavior, risks, and governance—not just field definitions.

When scaling groups shine

Variable traffic services where horizontal scale on VMs is cheaper than always-on peak capacity.
Batch or CI fleets that need to burst during business hours.
Kubernetes worker pools where node count should follow utilization without manual ticketing.

When to be cautious

Stateful workloads with local disks you cannot afford to lose—downscale policies may terminate the newest nodes first in default templates (last-in-first-out style).
Licence-bound software that charges per socket or per host—autoscale can surprise cost if max is too high.
Cold-start sensitive apps: adding nodes does not instantly add capacity if image pulls or bootstrap is slow.

How decisions are made (conceptual)

Metrics are collected from the fleet (CPU/memory snapshots; scope may be per group or whole workspace depending on configuration).
Thresholds define when to scale out (high utilization) and scale in (sustained low utilization).
Cooldown windows reduce flapping when metrics oscillate.
Min/max caps prevent runaway growth or accidental shrink to zero.
Reconciliation turns the desired count into provider calls that create or remove hosts using your compute profile.

Configuration surface (reference)

Scope - metrics can target the group alone or the whole workspace fleet, depending on how the group was defined.
Thresholds - CPU and memory upscale and downscale percentages (defaults on the order of high-90s upscale and ~50% downscale are common in templates).
Sizing - min / max host counts, compute profile binding, cooldown between scale operations, victim selection on downscale (often last-in-first-out style), optional keep windows, and mode (e.g. static sizing).
Reconciliation - rolling metric aggregation, scheduler interval, intended vs observed size, timestamps for the last scale event.
Live view - current member count and latest metrics snapshot the scheduler used.
Membership - hosts attached to the group, maintained by automation.

Provider resolution

Creating hosts through a scaling group may claim from operator-managed compute pools (see Compute provider model). If your workspace lacks eligible provider capacity, scale-out stalls with capacity-style errors—this is an operator fix (more pool, different region, or higher quota).

Permissions and audit

Scaling group APIs are workspace-scoped capabilities: creating a group, editing thresholds, and forcing scale actions should be RBAC-gated separately from read-only monitoring roles.

For enterprise patterns (SoD, break-glass, audit expectations), see Auditing and fine-grained access.

Operational guidance

Tune thresholds to avoid flapping on bursty workloads; use longer smoothing windows if metrics are noisy.
Keep max fleet size aligned with budget and quota agreements; pair with cost alerts where available.
Remember downscale often uses last-in-first-out host selection in default templates - verify whether that matches your stateful workload policy.
Test scale-out and scale-in in staging with realistic traffic; synthetic tiny tests miss provider timeouts and image pull delays.