-
Notifications
You must be signed in to change notification settings - Fork 297
topology-updater: Track new resources not present during startup #1897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @ozhuraki. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-nfd ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
Could you please take a look if this is a right place to track new resources? |
|
I don't think that's the right place to fix it as later call to |
001c2e7 to
230fc67
Compare
|
Thanks for the help. I checked more, would something like this account for updateAvailable()? |
Signed-off-by: Oleg Zhurakivskyy <[email protected]>
230fc67 to
0c8a8fa
Compare
|
Apologies for a delay. Thanks, updated, could you please take another look? |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
@PiotrProkop could you take a look? /ok-to-test |
|
@ozhuraki please fix the linter error(s) |
|
@ozhuraki please add unit test for this functionality. I'll try to review more in depth today/tomorrow. |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/reopen |
|
@marquiz: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ozhuraki The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@ozhuraki: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@ozhuraki any update on this one? /remove-lifecycle rotten |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
/remove-lifecycle stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to fix an issue where the NFD topology-updater doesn't track new resources from device plugins that register with kubelet after the updater has already started. The change adds logic to dynamically track and map new resources to NUMA nodes when they are first encountered during the available resource update process.
Changes:
- Adds dynamic resource tracking in
updateAvailableto handle device plugins that register after startup - Creates resource-to-NUMA-node mappings on-the-fly when new resources are detected
Comments suppressed due to low confidence (1)
pkg/resourcemonitor/noderesourcesaggregator.go:281
- This line will panic with a nil pointer dereference if numaData[nodeID][ri.Name] doesn't exist, which is the exact scenario this PR is trying to handle (new resources not present during startup). Even if the resourceID2NUMAID mapping is successfully created in the new code block, the corresponding entry in numaData must also exist before decrementing available. The new resource tracking logic is incomplete without also initializing the resource entry in the numaData structure.
numaData[nodeID][ri.Name].available--
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| continue | ||
| } | ||
|
|
||
| for _, resID := range ri.Data { |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested loop structure incorrectly maps all resource IDs to each NUMA node in sequence, causing the last NUMA node to overwrite previous mappings. If ri.NumaNodeIds contains multiple nodes [0, 1] and ri.Data contains ['dev0', 'dev1'], both devices will incorrectly be mapped only to the last NUMA node (1). The ResourceInfo structure doesn't provide device-to-node pairing, so this approach cannot correctly establish the mapping. Consider whether the ResourceInfo needs to be enhanced to include explicit device-to-NUMA-node pairing, or if the mapping logic needs to rely on the original topology information from the Pod Resources API.
| for _, resID := range ri.Data { | |
| for _, resID := range ri.Data { | |
| // Only set the mapping if this resource ID has not been mapped yet. | |
| if _, exists := resMap[resID]; exists { | |
| continue | |
| } |
| resMap, ok := noderesourceData.resourceID2NUMAID[resName] | ||
| if !ok { |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lookup is redundant since resMap was already retrieved (and potentially populated) at lines 243-263. The variable resMap is still in scope and can be reused directly, eliminating the need for this duplicate lookup.
| if _, ok := numaData[numaNodeID][ri.Name]; !ok { | ||
| klog.InfoS("failed to find resource under the node topology", "resourceName", ri.Name) | ||
| continue | ||
| } |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check verifies if the resource exists in numaData before adding it to the resourceID2NUMAID map, but the resource may legitimately not exist yet in numaData if this is a newly registered device plugin. The check at line 252 will cause the function to skip adding new resources to the map, which defeats the purpose of this PR. The perNuma/numaData structure is built from perNUMAAllocatable which is initialized only at startup from GetAllocatableResources, so new resources won't be present there until the aggregator is recreated. This condition should be reconsidered or removed.
If a device plugin registers itself with kubelet after NFD topology-updater is already running the NRT will never be updated with new resources.
Closes: #1424