Skip to content

[serve][6/n] Gang scheduling -- autoscaling#61467

Open
jeffreywang-anyscale wants to merge 1 commit intomasterfrom
gang-scheduling-part4-autoscaling
Open

[serve][6/n] Gang scheduling -- autoscaling#61467
jeffreywang-anyscale wants to merge 1 commit intomasterfrom
gang-scheduling-part4-autoscaling

Conversation

@jeffreywang-anyscale
Copy link
Contributor

Description

Adds gang-aware autoscaling: Enable num_replicas="auto" with gang scheduling so deployments can autoscale while respecting gang boundaries.

Approach

  • Removed the restriction that num_replicas="auto" is not allowed with gang scheduling.
  • apply_bounds() now aligns replica counts to gang_size multiples: rounds up when scaling up (to ensure enough capacity) and rounds down when scaling down (to release only complete gangs).
  • Schema validation enforces that min_replicas, max_replicas, and initial_replicas are multiples of gang_size.

Test Plan

Unit Tests

Category Test Description
Schema validation TestDeploymentSchema ::test_gang_scheduling_config_auto_replicas_accepted num_replicas="auto" accepted with valid autoscaling bounds
Schema validation TestDeploymentSchema ::test_gang_scheduling_config_auto_replicas_invalid_bounds Rejects min_replicas, max_replicas, initial_replicas not divisible by gang_size
Config validation TestGangSchedulingConfig ::test_gang_scheduling_config_auto_num_replicas num_replicas="auto" accepted via @serve.deployment
Config validation TestGangSchedulingConfig ::test_gang_scheduling_config_auto_num_replicas_via_options num_replicas="auto" accepted via .options()
Gang autoscaling bounds TestGangSchedulingAutoscaling ::test_scale_up_rounds_up Scale-up aligns to next gang_size multiple
Gang autoscaling bounds TestGangSchedulingAutoscaling ::test_scale_down_rounds_down Scale-down aligns to previous gang_size multiple
Gang autoscaling bounds TestGangSchedulingAutoscaling ::test_scale_down_respects_min Round-down never goes below min_replicas
Gang autoscaling bounds TestGangSchedulingAutoscaling ::test_scale_up_respects_max Round-up never exceeds max_replicas
Gang autoscaling bounds TestGangSchedulingAutoscaling ::test_no_gang Non-gang deployments are unaffected

Integration Tests

Test Description
TestGangScaling ::test_gang_autoscaling Autoscaling scales up to max under load, scales down when drained; replica count always multiple of gang_size
TestGangScaling ::test_gang_autoscaling_unaligned_upscale Autoscaling rounds up the number of target replicas during upscaling
TestGangScaling ::test_gang_autoscaling_unaligned_downscale Autoscaling rounds down the number of target replicas during upscaling

Related issues

RFC: #60873
Precedent: #61215

@jeffreywang-anyscale jeffreywang-anyscale requested a review from a team as a code owner March 3, 2026 22:11
@jeffreywang-anyscale jeffreywang-anyscale added serve Ray Serve Related Issue go add ONLY when ready to merge, run all tests labels Mar 3, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables autoscaling for gang-scheduled deployments, a significant feature enhancement. The changes correctly remove the previous restriction and introduce logic to align replica counts with gang_size during scaling operations, ensuring gang boundaries are respected. The implementation is clean, with rounding up for scale-ups and rounding down for scale-downs. Schema validations are also updated to enforce that autoscaling bounds (min_replicas, max_replicas, initial_replicas) are multiples of gang_size. The addition of comprehensive unit and integration tests is excellent and covers various scenarios, including unaligned scaling targets. Overall, this is a high-quality contribution. I have one minor suggestion to improve a comment for clarity.

Note: Security Review did not run due to the size of the PR.

and isinstance(new_deployment_config.num_replicas, int)
and new_deployment_config.autoscaling_config is None
):
# When autoscaling is enabled, num_replicas defaults to 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on this line is a bit confusing. It says "When autoscaling is enabled...", but this code block is executed only when new_deployment_config.autoscaling_config is None (i.e., when autoscaling is disabled). This might confuse future readers.

A clearer comment could explain that this check is for fixed-replica deployments.

Suggested change
# When autoscaling is enabled, num_replicas defaults to 1
# For fixed-replica deployments, num_replicas must be a multiple of gang_size.

autoscaling_config={"min_replicas": 4, "max_replicas": 8},
)
assert f2._deployment_config.autoscaling_config is not None
assert f._deployment_config.gang_scheduling_config.gang_size == 4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test asserts on wrong variable, missing coverage for f2

Medium Severity

In test_gang_scheduling_config_auto_num_replicas_via_options, line 848 asserts f._deployment_config.gang_scheduling_config.gang_size == 4 on the original deployment f instead of the new deployment f2 returned by .options(). The test is meant to verify that gang_scheduling_config is preserved through .options(), but checking the original f always passes trivially. If .options() ever regressed and dropped the gang config from the copy, this test would not catch it.

Fix in Cursor Fix in Web

raise ValueError(
f"num_replicas ({num_replicas}) must be a multiple of "
f"gang_size ({v.gang_size})."
f"num_replicas ({values['num_replicas']}) must be a multiple of gang_size ({v.gang_size})."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing autoscaling bounds gang_size validation in Python API

Medium Severity

The validate_gang_scheduling_config validator in DeploymentConfig skips all checks when autoscaling_config is present but never validates that min_replicas, max_replicas, and initial_replicas are multiples of gang_size. This validation only exists in schema.py (REST API path), so users of the Python API (@serve.deployment() or .options()) can create misaligned configurations. When apply_bounds later clips to non-gang-aligned bounds, the resulting replica count won't be a multiple of gang_size, defeating gang scheduling.

Fix in Cursor Fix in Web

Allow num_replicas="auto" with gang_scheduling_config. The autoscaler's
apply_bounds() now aligns replica counts to gang_size multiples: ceil for
upscale, floor for downscale. Schema validation ensures min/max/initial
replicas are multiples of gang_size.

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
@jeffreywang-anyscale jeffreywang-anyscale force-pushed the gang-scheduling-part4-autoscaling branch from e588662 to 200f128 Compare March 4, 2026 04:24
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

raise ValueError(
f"autoscaling_config.{field} ({val}) must be a "
f"multiple of gang_size ({gang_config.gang_size})."
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema validation misses default autoscaling bound values

Medium Severity

The gang-size validation in schema.py uses autoscaling_config.get(field) on the raw user-provided dict, which returns None for fields the user didn't explicitly set. Default values like min_replicas=1 and max_replicas=1 from AutoscalingConfig are never checked against gang_size. A user providing autoscaling_config={"target_ongoing_requests": 5} with gang_size=4 would bypass validation entirely, yet the effective defaults (1, 1) aren't multiples of 4.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants