Kueue does not remove the scheduling gate from Ray’s redis-cleanup jobs

**What happened**:
In a K8s cluster with Kueue and KubeRay 1.4.0, I deployed a Ray Serve workload via Kueue-managed queue, with the `kueue.x-k8s.io/elastic-job: "true"` annotation and `spec.enableInTreeAutoscaling: true`. When the Ray Serve is being terminated, KubeRay launches a redis-cleanup batch job, which runs a pod to clean up the redis-cleanup config used by the Ray Cluster (RC), which was launched by Ray Serve.
With Kueue, the redis-cleanup pod remains schedule-gated and in Pending state. Some logic in KubeRay eventually kills it but the cleanup never happens.
This is because Kueue does not remove the pod scheduling gate in the redis-cleanup pod. Analysis below.

**What you expected to happen**:
I expected the redis-cleanup pod to run to completion and the RC to exit gracefully. 

**How to reproduce it (as minimally and precisely as possible)**:
- Deploy a Ray Serve CR via a Kueue-managed queue, with the annotation `kueue.x-k8s.io/elastic-job: "true"` and `spec.enableInTreeAutoscaling: true`. 
- Terminate the RayServe deployment.

**Anything else we need to know?**:
Here's my analysis so far:
  - User submits Ray Service CR with above specs.
  - KubeRay's Ray Service Controller creates a Ray Cluster (RC) from the above.
  - Kueue's Ray Cluster webhook intercepts it and sets the pod scheduler gate in the RC's pod spec. 
             schedulingGates:
                 - name: kueue.x-k8s.io/elastic-job
  - When the RC is terminated, KubeRay creates a batch/v1/Job named redis-cleanup, apparently from the head group spec of the RC. This job inherits the scheduling gate.
  - The batch/v1/Job creates a redis-cleanup pod, which also inherits the scheduling gate. This pod's owner reference is the batch/v1/Job, not the Ray Cluster.
  - Kueue reacts but does not remove the scheduling gate from the pod because it is owned by the Job. not the RC. Kueue only looks for pods owned by the RC.

*Candidate solution*: Modify the batch job webhook in Kueue to remove the scheduling gate from the job if (a) it is owned by a ray.io/v1/RayCluster, AND (b) has the label "ray.io/node-type = redis-cleanup" . 

*Rationale for the solution*: Kueue does not to handle autoscaling for the redis-cleanup pod; so, it need not be treated as an elastic job. 

**Environment**:
- Kubernetes version (use `kubectl version`): 
   - Client Version: v1.32.2
   - Kustomize Version: v5.5.0
   - Server Version: v1.32.9-eks-3025e55
- Kueue version (use `git describe --tags --dirty --always`): 0.14.3 
- Cloud provider or hardware configuration: AWS EKS v1.32.9 
- OS (e.g: `cat /etc/os-release`): 
- Kernel (e.g. `uname -a`):
- Install tools:
- Others:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kueue does not remove the scheduling gate from Ray’s redis-cleanup jobs #8443

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kueue does not remove the scheduling gate from Ray’s redis-cleanup jobs #8443

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions