[Data] Add ClusterUtil dataclass for ResourceUtilizationGauge#61436
[Data] Add ClusterUtil dataclass for ResourceUtilizationGauge#61436bveeramani wants to merge 2 commits intomasterfrom
ClusterUtil dataclass for ResourceUtilizationGauge#61436Conversation
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Code Review
This pull request introduces a new ClusterUtil dataclass to correctly handle floating-point resource utilization values, which is a good improvement. The implementation is sound. I've added a couple of suggestions to refactor the validation logic in the new dataclass to improve its maintainability by reducing code duplication.
| @@ -1,12 +1,26 @@ | |||
| import abc | |||
| import math | |||
| from dataclasses import dataclass | |||
| def __post_init__(self): | ||
| # If we overcommit tasks, the logical utilization can exceed 1.0. | ||
| assert math.isfinite(self.cpu) and 0 <= self.cpu | ||
| assert math.isfinite(self.gpu) and 0 <= self.gpu | ||
| assert math.isfinite(self.memory) and 0 <= self.memory | ||
| assert math.isfinite(self.object_store_memory) and 0 <= self.object_store_memory |
There was a problem hiding this comment.
To improve maintainability and reduce code duplication, you can iterate over the dataclass fields to perform the validation. This approach will automatically include any new fields added to the dataclass in the future, making the code more robust.
| def __post_init__(self): | |
| # If we overcommit tasks, the logical utilization can exceed 1.0. | |
| assert math.isfinite(self.cpu) and 0 <= self.cpu | |
| assert math.isfinite(self.gpu) and 0 <= self.gpu | |
| assert math.isfinite(self.memory) and 0 <= self.memory | |
| assert math.isfinite(self.object_store_memory) and 0 <= self.object_store_memory | |
| def __post_init__(self): | |
| # If we overcommit tasks, the logical utilization can exceed 1.0. | |
| for f in fields(self): | |
| value = getattr(self, f.name) | |
| assert math.isfinite(value) and value >= 0, f"Invalid value for {f.name}: {value}" |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
python/ray/data/_internal/cluster_autoscaler/resource_utilization_gauge.py
Show resolved
Hide resolved
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
ExecutionResourcescan't represent object store memory and memory utilization correctly because it rounds values to the nearest integer. So, the utilization for those resources is always either 0 or 1.To ensure we can express the full [0, 1] range of resource utilization, this PR introduces a new
ClusterUtildataclass rather than re-usingExecutionResources.