Skip to content

[Data] Add ClusterUtil dataclass for ResourceUtilizationGauge#61436

Open
bveeramani wants to merge 2 commits intomasterfrom
add-cluster-util
Open

[Data] Add ClusterUtil dataclass for ResourceUtilizationGauge#61436
bveeramani wants to merge 2 commits intomasterfrom
add-cluster-util

Conversation

@bveeramani
Copy link
Member

ExecutionResources can't represent object store memory and memory utilization correctly because it rounds values to the nearest integer. So, the utilization for those resources is always either 0 or 1.

To ensure we can express the full [0, 1] range of resource utilization, this PR introduces a new ClusterUtil dataclass rather than re-using ExecutionResources.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner March 3, 2026 02:21
@bveeramani bveeramani requested a review from owenowenisme March 3, 2026 02:22
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new ClusterUtil dataclass to correctly handle floating-point resource utilization values, which is a good improvement. The implementation is sound. I've added a couple of suggestions to refactor the validation logic in the new dataclass to improve its maintainability by reducing code duplication.

@@ -1,12 +1,26 @@
import abc
import math
from dataclasses import dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To support iterating over dataclass fields for validation in __post_init__, please also import fields here. This will be used in the refactoring of __post_init__.

Suggested change
from dataclasses import dataclass
from dataclasses import dataclass, fields

Comment on lines +18 to +23
def __post_init__(self):
# If we overcommit tasks, the logical utilization can exceed 1.0.
assert math.isfinite(self.cpu) and 0 <= self.cpu
assert math.isfinite(self.gpu) and 0 <= self.gpu
assert math.isfinite(self.memory) and 0 <= self.memory
assert math.isfinite(self.object_store_memory) and 0 <= self.object_store_memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, you can iterate over the dataclass fields to perform the validation. This approach will automatically include any new fields added to the dataclass in the future, making the code more robust.

Suggested change
def __post_init__(self):
# If we overcommit tasks, the logical utilization can exceed 1.0.
assert math.isfinite(self.cpu) and 0 <= self.cpu
assert math.isfinite(self.gpu) and 0 <= self.gpu
assert math.isfinite(self.memory) and 0 <= self.memory
assert math.isfinite(self.object_store_memory) and 0 <= self.object_store_memory
def __post_init__(self):
# If we overcommit tasks, the logical utilization can exceed 1.0.
for f in fields(self):
value = getattr(self, f.name)
assert math.isfinite(value) and value >= 0, f"Invalid value for {f.name}: {value}"

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani enabled auto-merge (squash) March 3, 2026 02:29
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Mar 3, 2026
@ray-gardener ray-gardener bot added the data Ray Data-related issues label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants