Skip to content

karpenter_nodepools_allowed_disruptions metrics is mismatching actual budget schedule or schedule is not working #2344

@ipleten

Description

@ipleten

Description

Observed Behavior:
I have:

spec:
  disruption:
    budgets:
    - duration: 12h00m0s
      nodes: "0"
      reasons:
      - Drifted
      - Empty
      - Underutilized
      schedule: 0 10 * * mon-fri
    - nodes: "2"
    consolidateAfter: 30s

but if I look graphs or query metrics endpoint:

Image
metrics endpoint returns:

curl localhost:8080/metrics -s | grep karpenter_nodepools_allowed_disruptions | grep opensearch
karpenter_nodepools_allowed_disruptions{nodepool="opensearch-amd64-on-demand",reason="Empty"} 2
karpenter_nodepools_allowed_disruptions{nodepool="opensearch-amd64-on-demand",reason="Underutilized"} 0

In my understanding it should be almost the same and forbidding disruption according to schedule (so showing 0 for this period of time) but graph show budget is different for a long time.
There is no consolidation happening during this query.

Expected Behavior:
Metric should reflect actual schedule, but the problem is that I' not sure that schedule even works as it should.

Reproduction Steps (Please include YAML):
There is another example for another nodepool:

Image

spec:
  disruption:
    budgets:
    - nodes: "3"
      reasons:
      - Empty
      - Underutilized
    - nodes: "2"
      reasons:
      - Drifted
    - duration: 12h0m0s
      nodes: "0"
      reasons:
      - Drifted
      schedule: 0 12 * * mon-fri
    consolidateAfter: 30s
    consolidationPolicy: WhenEmptyOrUnderutilized

Graph shows that only UnderUtilized is active all the time but not Empty
also Drifted doesn't consistent it was allowing 2 nodes but after weekend
it reports 0.

curl localhost:8080/metrics -s | grep karpenter_nodepools_allowed_disruptions | grep bottle    
karpenter_nodepools_allowed_disruptions{nodepool="bottlerocket-nodepool",reason="Empty"} 3
karpenter_nodepools_allowed_disruptions{nodepool="bottlerocket-nodepool",reason="Underutilized"} 3

no "Drifted" is reported at all, which is fine but I'd like to have 0 reported.

Versions:

  • Chart Version: 1.3.3
  • Kubernetes Version (kubectl version):
#kubectl version                                                                                                                               
Client Version: v1.33.2
Kustomize Version: v5.6.0
Server Version: v1.30.13-eks-5d4a308
WARNING: version difference between client (1.33) and server (1.30) exceeds the supported minor version skew of +/-1
Kubecolor Version: v0.5.0

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions