[serve][6/n] Gang scheduling -- autoscaling by jeffreywang-anyscale · Pull Request #61467 · ray-project/ray

jeffreywang-anyscale · 2026-03-03T22:11:40Z

Description

Adds gang-aware autoscaling: Enable num_replicas="auto" with gang scheduling so deployments can autoscale while respecting gang boundaries.

Approach

Removed the restriction that num_replicas="auto" is not allowed with gang scheduling.
apply_bounds() now aligns replica counts to gang_size multiples: rounds up when scaling up (to ensure enough capacity) and rounds down when scaling down (to release only complete gangs).
Schema validation enforces that min_replicas, max_replicas, and initial_replicas are multiples of gang_size.

Test Plan

Unit Tests

Category	Test	Description
Schema validation	`TestDeploymentSchema ::test_gang_scheduling_config_auto_replicas_accepted`	`num_replicas="auto"` accepted with valid autoscaling bounds
Schema validation	`TestDeploymentSchema ::test_gang_scheduling_config_auto_replicas_invalid_bounds`	Rejects `min_replicas`, `max_replicas`, `initial_replicas` not divisible by `gang_size`
Config validation	`TestGangSchedulingConfig ::test_gang_scheduling_config_auto_num_replicas`	`num_replicas="auto"` accepted via `@serve.deployment`
Config validation	`TestGangSchedulingConfig ::test_gang_scheduling_config_auto_num_replicas_via_options`	`num_replicas="auto"` accepted via `.options()`
Gang autoscaling bounds	`TestGangSchedulingAutoscaling ::test_scale_up_rounds_up`	Scale-up aligns to next `gang_size` multiple
Gang autoscaling bounds	`TestGangSchedulingAutoscaling ::test_scale_down_rounds_down`	Scale-down aligns to previous `gang_size` multiple
Gang autoscaling bounds	`TestGangSchedulingAutoscaling ::test_scale_down_respects_min`	Round-down never goes below `min_replicas`
Gang autoscaling bounds	`TestGangSchedulingAutoscaling ::test_scale_up_respects_max`	Round-up never exceeds `max_replicas`
Gang autoscaling bounds	`TestGangSchedulingAutoscaling ::test_no_gang`	Non-gang deployments are unaffected

Integration Tests

Test	Description
`TestGangScaling ::test_gang_autoscaling`	Autoscaling scales up to max under load, scales down when drained; replica count always multiple of `gang_size`
`TestGangScaling ::test_gang_autoscaling_unaligned_upscale`	Autoscaling rounds up the number of target replicas during upscaling
`TestGangScaling ::test_gang_autoscaling_unaligned_downscale`	Autoscaling rounds down the number of target replicas during upscaling

Related issues

RFC: #60873
Precedent: #61215

gemini-code-assist

Code Review

This pull request enables autoscaling for gang-scheduled deployments, a significant feature enhancement. The changes correctly remove the previous restriction and introduce logic to align replica counts with gang_size during scaling operations, ensuring gang boundaries are respected. The implementation is clean, with rounding up for scale-ups and rounding down for scale-downs. Schema validations are also updated to enforce that autoscaling bounds (min_replicas, max_replicas, initial_replicas) are multiples of gang_size. The addition of comprehensive unit and integration tests is excellent and covers various scenarios, including unaligned scaling targets. Overall, this is a high-quality contribution. I have one minor suggestion to improve a comment for clarity.

_{Note: Security Review did not run due to the size of the PR.}

gemini-code-assist · 2026-03-03T22:16:51Z

python/ray/serve/deployment.py

+            and isinstance(new_deployment_config.num_replicas, int)
+            and new_deployment_config.autoscaling_config is None
+        ):
+            # When autoscaling is enabled, num_replicas defaults to 1


The comment on this line is a bit confusing. It says "When autoscaling is enabled...", but this code block is executed only when new_deployment_config.autoscaling_config is None (i.e., when autoscaling is disabled). This might confuse future readers.

A clearer comment could explain that this check is for fixed-replica deployments.

Suggested change

# When autoscaling is enabled, num_replicas defaults to 1

# For fixed-replica deployments, num_replicas must be a multiple of gang_size.

cursor · 2026-03-03T22:22:14Z

python/ray/serve/tests/unit/test_config.py

+            autoscaling_config={"min_replicas": 4, "max_replicas": 8},
+        )
+        assert f2._deployment_config.autoscaling_config is not None
+        assert f._deployment_config.gang_scheduling_config.gang_size == 4


Test asserts on wrong variable, missing coverage for f2

Medium Severity

In test_gang_scheduling_config_auto_num_replicas_via_options, line 848 asserts f._deployment_config.gang_scheduling_config.gang_size == 4 on the original deployment f instead of the new deployment f2 returned by .options(). The test is meant to verify that gang_scheduling_config is preserved through .options(), but checking the original f always passes trivially. If .options() ever regressed and dropped the gang config from the copy, this test would not catch it.

cursor · 2026-03-03T22:22:14Z

python/ray/serve/_private/config.py

            raise ValueError(
-                f"num_replicas ({num_replicas}) must be a multiple of "
-                f"gang_size ({v.gang_size})."
+                f"num_replicas ({values['num_replicas']}) must be a multiple of gang_size ({v.gang_size})."


Missing autoscaling bounds gang_size validation in Python API

Medium Severity

The validate_gang_scheduling_config validator in DeploymentConfig skips all checks when autoscaling_config is present but never validates that min_replicas, max_replicas, and initial_replicas are multiples of gang_size. This validation only exists in schema.py (REST API path), so users of the Python API (@serve.deployment() or .options()) can create misaligned configurations. When apply_bounds later clips to non-gang-aligned bounds, the resulting replica count won't be a multiple of gang_size, defeating gang scheduling.

Allow num_replicas="auto" with gang_scheduling_config. The autoscaler's apply_bounds() now aligns replica counts to gang_size multiples: ceil for upscale, floor for downscale. Schema validation ensures min/max/initial replicas are multiples of gang_size. Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-03-04T04:28:16Z

python/ray/serve/schema.py

+                        raise ValueError(
+                            f"autoscaling_config.{field} ({val}) must be a "
+                            f"multiple of gang_size ({gang_config.gang_size})."
+                        )


Schema validation misses default autoscaling bound values

Medium Severity

The gang-size validation in schema.py uses autoscaling_config.get(field) on the raw user-provided dict, which returns None for fields the user didn't explicitly set. Default values like min_replicas=1 and max_replicas=1 from AutoscalingConfig are never checked against gang_size. A user providing autoscaling_config={"target_ongoing_requests": 5} with gang_size=4 would bypass validation entirely, yet the effective defaults (1, 1) aren't multiples of 4.

jeffreywang-anyscale requested a review from a team as a code owner March 3, 2026 22:11

jeffreywang-anyscale added serve Ray Serve Related Issue go add ONLY when ready to merge, run all tests labels Mar 3, 2026

jeffreywang-anyscale assigned abrarsheikh and unassigned abrarsheikh Mar 3, 2026

jeffreywang-anyscale requested a review from abrarsheikh March 3, 2026 22:12

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

cursor bot reviewed Mar 3, 2026

View reviewed changes

jeffreywang-anyscale force-pushed the gang-scheduling-part4-autoscaling branch from e588662 to 200f128 Compare March 4, 2026 04:24

cursor bot reviewed Mar 4, 2026

View reviewed changes

jeffreywang-anyscale mentioned this pull request Mar 4, 2026

[serve][llm] Introduce DP group fault tolerance #61480

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serve][6/n] Gang scheduling -- autoscaling#61467

[serve][6/n] Gang scheduling -- autoscaling#61467
jeffreywang-anyscale wants to merge 1 commit intomasterfrom
gang-scheduling-part4-autoscaling

jeffreywang-anyscale commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

cursor bot Mar 3, 2026

Uh oh!

cursor bot Mar 3, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# When autoscaling is enabled, num_replicas defaults to 1
	# For fixed-replica deployments, num_replicas must be a multiple of gang_size.

Conversation

jeffreywang-anyscale commented Mar 3, 2026

Description

Approach

Test Plan

Unit Tests

Integration Tests

Related issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 3, 2026

Choose a reason for hiding this comment

Test asserts on wrong variable, missing coverage for f2

Uh oh!

cursor bot Mar 3, 2026

Choose a reason for hiding this comment

Missing autoscaling bounds gang_size validation in Python API

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 4, 2026

Choose a reason for hiding this comment

Schema validation misses default autoscaling bound values

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test asserts on wrong variable, missing coverage for `f2`