Restart driver pods in place when driver config is unchanged#2527
Merged
rajathagasthya merged 1 commit intoJun 30, 2026
Merged
Conversation
cdesiniotis
reviewed
Jun 9, 2026
ce94262 to
5165328
Compare
5165328 to
6144fb2
Compare
tariq1890
reviewed
Jun 23, 2026
3feaaf8 to
40151ff
Compare
40151ff to
9b0cd08
Compare
tariq1890
reviewed
Jun 29, 2026
9b0cd08 to
d743da5
Compare
cdesiniotis
approved these changes
Jun 30, 2026
Contributor
|
LGTM, thanks @rajathagasthya. Let's get another approval on this. |
tariq1890
reviewed
Jun 30, 2026
A patch chart upgrade can change only cosmetic pod-template metadata (e.g. the helm.sh/chart label) without changing the driver itself. The upgrade controller keys on the DaemonSet's controller revision hash, so such a change still evicts running GPU workloads and drains the node -- for no driver benefit. Register a RestartOnlyPredicate on the upgrade state manager that compares DRIVER_CONFIG_DIGEST -- a hash of the install-relevant driver config, already set on the driver pod template -- between the running pod and the desired DaemonSet. When the digests match, the node is cordoned and the driver pod restarted in place, with no workload eviction or drain; the driver fast-path keeps the kernel modules loaded across the restart, so running GPU workloads are not disrupted. Cordoning keeps the node unschedulable if the restart fails, and the node is uncordoned on success. A missing or differing digest falls back to the full upgrade flow. The digest env name and a reader for it live in internal/config beside the digest definition; the restart-only routing decision lives in internal/predicates and is registered on the upgrade state manager in main.go. The RestartOnlyPredicate hook it relies on is provided by k8s-operator-libs, vendored here at the merged version. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
d743da5 to
0bbf881
Compare
tariq1890
approved these changes
Jun 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
A patch chart upgrade can change only cosmetic pod-template metadata (e.g. the
helm.sh/chartlabel) without changing the driver itself. The upgrade controller keys on the DaemonSet's controller revision hash, so such a change still evicts running GPU workloads and drains the node, causing disruption for running workloads.Register a
RestartOnlyPredicateon the upgrade state manager (from theUpgradeReconciler) that comparesDRIVER_CONFIG_DIGEST— a hash of the install-relevant driver config, already set on the driver pod template — between the running pod and the desired DaemonSet. When the digests match, the node is cordoned and the driver pod restarted in place, with no workload eviction or drain; the driver fast-path keeps the kernel modules loaded across the restart, so running GPU workloads are not disrupted. Cordoning keeps the node unschedulable if the restart fails (same as the full flow), and the node is uncordoned on success. A missing or differing digest falls back to the full upgrade flow.If the predicate returns an error or the cordon fails, the node stays in
upgrade-requiredand is retried on a later reconcile (with a Warning event), rather than being routed to the disruptive flow on an unknown answer.Known limitation: the first upgrade from a release without restart-only is still disruptive, because the old operator holds the leader-election lease and routes the upgrade before the new operator becomes leader. Steady-state (both sides have the code) is non-disruptive.
Related to NVIDIA/k8s-operator-libs#145
Checklist
make lint)make validate-generated-assets)make validate-modules)Testing
Unit tests:
internal/config:TestDriverConfigDigestFromPodSpec— digest reader, incl. nil/empty/container-precedence cases.controllers:TestDriverPodRestartOnly— predicate routing, incl. nil pod/DS and missing/equal/differing digests.Manual testing (single-node cluster, GPU workload running throughout):
upgrade-required → pod-restart-required(cordoned, nevercordon-required), the driver pod restarts via the fast path, and the GPU workload is not evicted.