Skip to content

feat(core): virt-launcher logs process tree + D-state escalation for stuck VMs#2528

Draft
fl64 wants to merge 3 commits into
mainfrom
feat/virt-launcher-proctree
Draft

feat(core): virt-launcher logs process tree + D-state escalation for stuck VMs#2528
fl64 wants to merge 3 commits into
mainfrom
feat/virt-launcher-proctree

Conversation

@fl64

@fl64 fl64 commented Jun 23, 2026

Copy link
Copy Markdown
Member

Description

Bumps 3p-kubevirt to a revision that adds a lightweight periodic process-tree snapshot and D-state escalation to the virt-launcher monitor (pkg/virt-launcher/monitor.go).

The monitor loop already ticks every second and reads /proc/<pid>/status to detect zombie QEMU. This change extends that read to also detect the D (uninterruptible disk sleep) state, and adds:

  • A periodic (30s, V(3)) one-line process-tree snapshot:
    proctree: tini(1,S)→vl-monitor(2,S)→virt-launcher(3,S) virtqemud(4,S)→qemu(5,R)[vcpu0:R,vcpu1:R,io:R]
  • A D-state escalation line at default log level (not V-gated) with wchan and best-effort /proc/<pid>/stack:
    qemu pid 113382 D-state 10s wchan=__drbd_make_request stack=__drbd_make_request+0x34f/0x610[drbd]→drbd_submit_bio+0x36f/0x3e0[drbd]→__submit_bio→submit_bio_noacct→__blkdev_direct_IO_async→blkdev_write_iter
  • Throttled to one escalation line per 10s (a hang lasts minutes/hours, so the signal is never lost).

All reads are pure /proc, no new dependencies, no exec/shell. Behavior is additive; pidExists is retained as a backward-compatible wrapper over the new pidState, so existing callers and tests are untouched.

Why do we need it, and what problem does it solve?

When a QEMU process enters D-state — e.g. stuck inside __drbd_make_request during a DRBD replication stall, or blocked on a CSI/NFS volume — SIGKILL is ignored (the process is in TASK_UNINTERRUPTIBLE inside a kernel syscall). The d8v-vm-* pod then hangs in Terminating for days: kubelet retries StopContainer, each attempt times out with DeadlineExceeded, the pod's pod-protection finalizer never clears, and the VirtualMachine vm-cleanup finalizer never progresses.

Until now, the virt-launcher logs showed only Refreshing. domainName ... pid <N> repeated forever — zero visibility into why the container wouldn't die. Diagnosing it required SSH onto the hypervisor and hand-parsing pstree + /proc/<pid>/stack + /proc/<pid>/wchan.

With this change, the same evidence is in kubectl logs -n <ns> <d8v-vm-pod> -c d8v-compute the moment the hang starts. One grep returns the full chain: normal tree → the moment QEMU enters D → the kernel stack pointing at the stuck subsystem (DRBD/NFS/CSI) → the kubelet SIGTERM that can't take effect.

What is the expected result?

  • A VM whose QEMU is stuck in D-state produces a clear, throttled escalation line in the virt-launcher pod logs, naming the kernel function it's blocked in.
  • A periodic process-tree snapshot is available at V(3) for baseline comparison.
  • No change to normal (healthy) VM lifecycle, shutdown, or restart behavior. pidExists zombie handling is preserved.
  • kubectl logs <d8v-vm-pod> -c d8v-compute | grep -E "D-state|proctree" is the one-command diagnosis for a stuck-Terminating VM.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: feature
summary: "virt-launcher now logs a periodic process-tree snapshot and escalates D-state QEMU hangs with wchan + kernel stack, so a VM stuck in Terminating is diagnosable from pod logs without node SSH."
impact_level: low

Point the kubevirt source artifact build at the feat/virt-launcher-proctree
branch in deckhouse/3p-kubevirt to test the virt-launcher process-tree
snapshot + D-state escalation patch before tagging.

Test-only change; the .version string will read 'vfeat/virt-dirty' — cosmetic,
does not affect the build. Will be reverted to a proper v1.6.2-v12n.N tag once
the proctree patch is merged into v1.6.2-virtualization.

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
@fl64 fl64 changed the title chore(core): bump 3p-kubevirt to feat/virt-launcher-proctree for testing feat(core): virt-launcher logs process tree + D-state escalation for stuck VMs Jun 23, 2026
fl64 added 2 commits June 23, 2026 20:56
Add installCacheVersion to the virt-artifact build so werf does not reuse a
cached layer when testing the feat/virt-launcher-proctree dev branch of
3p-kubevirt. Without this, the clone+build step is cached and the proctree
patch never makes it into the image.

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
virt-operator (SA kubevirt-operator) generates the ClusterRole
kubevirt-internal-virtualization-controller with a pods/resize:update rule
(upstream kubevirt 1.6.2 inplace resize feature, PR #103 in 3p-kubevirt). On
apply, Kubernetes RBAC escalation protection blocks it because the operator's
own ClusterRole d8:virtualization:kubevirt-operator does not hold that right,
so the kubevirt install-strategy never rolls out and virt-handler/virt-controller
stay on the old image.

Add pods/resize:update to d8:virtualization:kubevirt-operator next to the
existing pods/finalizers:update rule.

Safe on all k8s versions: RBAC does not validate subresource existence; on
clusters where the InPlacePodVerticalScaling subresource is absent, the rule
is inert.

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant