feat(core): virt-launcher logs process tree + D-state escalation for stuck VMs#2528
Draft
fl64 wants to merge 3 commits into
Draft
feat(core): virt-launcher logs process tree + D-state escalation for stuck VMs#2528fl64 wants to merge 3 commits into
fl64 wants to merge 3 commits into
Conversation
Point the kubevirt source artifact build at the feat/virt-launcher-proctree branch in deckhouse/3p-kubevirt to test the virt-launcher process-tree snapshot + D-state escalation patch before tagging. Test-only change; the .version string will read 'vfeat/virt-dirty' — cosmetic, does not affect the build. Will be reverted to a proper v1.6.2-v12n.N tag once the proctree patch is merged into v1.6.2-virtualization. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Add installCacheVersion to the virt-artifact build so werf does not reuse a cached layer when testing the feat/virt-launcher-proctree dev branch of 3p-kubevirt. Without this, the clone+build step is cached and the proctree patch never makes it into the image. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
virt-operator (SA kubevirt-operator) generates the ClusterRole kubevirt-internal-virtualization-controller with a pods/resize:update rule (upstream kubevirt 1.6.2 inplace resize feature, PR #103 in 3p-kubevirt). On apply, Kubernetes RBAC escalation protection blocks it because the operator's own ClusterRole d8:virtualization:kubevirt-operator does not hold that right, so the kubevirt install-strategy never rolls out and virt-handler/virt-controller stay on the old image. Add pods/resize:update to d8:virtualization:kubevirt-operator next to the existing pods/finalizers:update rule. Safe on all k8s versions: RBAC does not validate subresource existence; on clusters where the InPlacePodVerticalScaling subresource is absent, the rule is inert. Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Bumps
3p-kubevirtto a revision that adds a lightweight periodic process-tree snapshot and D-state escalation to thevirt-launchermonitor (pkg/virt-launcher/monitor.go).The monitor loop already ticks every second and reads
/proc/<pid>/statusto detect zombie QEMU. This change extends that read to also detect theD(uninterruptible disk sleep) state, and adds:proctree: tini(1,S)→vl-monitor(2,S)→virt-launcher(3,S) virtqemud(4,S)→qemu(5,R)[vcpu0:R,vcpu1:R,io:R]/proc/<pid>/stack:qemu pid 113382 D-state 10s wchan=__drbd_make_request stack=__drbd_make_request+0x34f/0x610[drbd]→drbd_submit_bio+0x36f/0x3e0[drbd]→__submit_bio→submit_bio_noacct→__blkdev_direct_IO_async→blkdev_write_iterAll reads are pure
/proc, no new dependencies, no exec/shell. Behavior is additive;pidExistsis retained as a backward-compatible wrapper over the newpidState, so existing callers and tests are untouched.Why do we need it, and what problem does it solve?
When a QEMU process enters D-state — e.g. stuck inside
__drbd_make_requestduring a DRBD replication stall, or blocked on a CSI/NFS volume —SIGKILLis ignored (the process is inTASK_UNINTERRUPTIBLEinside a kernel syscall). Thed8v-vm-*pod then hangs inTerminatingfor days: kubelet retriesStopContainer, each attempt times out withDeadlineExceeded, the pod'spod-protectionfinalizer never clears, and theVirtualMachinevm-cleanupfinalizer never progresses.Until now, the virt-launcher logs showed only
Refreshing. domainName ... pid <N>repeated forever — zero visibility into why the container wouldn't die. Diagnosing it required SSH onto the hypervisor and hand-parsingpstree+/proc/<pid>/stack+/proc/<pid>/wchan.With this change, the same evidence is in
kubectl logs -n <ns> <d8v-vm-pod> -c d8v-computethe moment the hang starts. One grep returns the full chain: normal tree → the moment QEMU enters D → the kernel stack pointing at the stuck subsystem (DRBD/NFS/CSI) → the kubelet SIGTERM that can't take effect.What is the expected result?
pidExistszombie handling is preserved.kubectl logs <d8v-vm-pod> -c d8v-compute | grep -E "D-state|proctree"is the one-command diagnosis for a stuck-Terminating VM.Checklist
Changelog entries